Recode-Hive / Scrape-ML

For new data generation Semi-supervised-sequence-learning-Project we have writtern a python script to fetch📊, data from the 💻, imdb website 🌐 and converted into txt files.
https://scrape-ml.streamlit.app/
MIT License
80 stars 117 forks source link

Increasing Ocr accuracy through image preprocessing and image segmentation #199

Open litesh1123 opened 1 week ago

litesh1123 commented 1 week ago

Related Issue

[Cite any related issue(s) this pull request addresses. If none, simply state “None”]

97

Description

[Please include a brief description of the changes or features added] added image preprocessing:- before undergoing OCR extraction , users can choose functions like greyscale, threshold, adaptive threshold and denoise where user can see manipulation of image and extracted text in output section. added image segmentation:- Images would be divided into parts for ROI (region of interest) and text will be extracted from the divided parts.

Type of PR

Screenshots / videos (if applicable)

[Attach any relevant screenshots or videos demonstrating the changes] Demo video link- https://drive.google.com/file/d/1phb0gmf1UlSvp7tU80azUM3PQH9Wo3f1/view?usp=sharing Screenshot 2024-06-21 143429 Screenshot 2024-06-21 143532 Screenshot 2024-06-21 143726

Checklist:

Additional context:

[Include any additional information or context that might be helpful for reviewers.] @sanjay-kv sir another guy who raised the issue I tried to contact him but no reply,i have reviewed and added necessary changes for increasing OCR accuracy.

sanjay-kv commented 1 week ago

image looks like need to pull before pushing

litesh1123 commented 1 week ago

@sanjay-kv I didn't notice how these things were changed Now what shall I do next sir. I didn't change anything in these files

sanjay-kv commented 1 week ago

thats your file right, need to remove those files from the repo. and try push