Recode-Hive / Scrape-ML

For new data generation Semi-supervised-sequence-learning-Project we have writtern a python script to fetch📊, data from the 💻, imdb website 🌐 and converted into txt files.
https://scrape-ml.streamlit.app/
MIT License
80 stars 117 forks source link

Ocr accuracy optimization through image preprocessng and image segmentation #198

Closed litesh1123 closed 1 week ago

litesh1123 commented 1 week ago

Related Issue

[Cite any related issue(s) this pull request addresses. If none, simply state “None”]

97

Description

[Please include a brief description of the changes or features added]

added image preprocessing:- before undergoing OCR extraction , users can choose functions like greyscale, threshold, adaptive threshold and denoise where user can see manipulation of image and extracted text in output section. added image segmentation:- Images would be divided into parts for ROI (region of interest) and text will be extracted from the divided parts.

Type of PR

Screenshots / videos (if applicable)

[Attach any relevant screenshots or videos demonstrating the changes] Demo video link- https://drive.google.com/file/d/1phb0gmf1UlSvp7tU80azUM3PQH9Wo3f1/view?usp=sharing Screenshot 2024-06-21 143429 Screenshot 2024-06-21 143532

Checklist:

Additional context:

[Include any additional information or context that might be helpful for reviewers.] @sanjay-kv sir other guy who raised issue i tried to contact him but no reply ,i have reviewed and added necessary changes for increasing OCR accuracy.

litesh1123 commented 1 week ago

@sanjay-kv sir, if possible please assign level3 sir, invested lot of time and research for adding features

litesh1123 commented 1 week ago

@sanjay-kv sir i see some conflicts in pull request, so making a new branch and new pr , closing the issue and deleting branch