Recode-Hive / Scrape-ML

For new data generation Semi-supervised-sequence-learning-Project we have writtern a python script to fetch📊, data from the 💻, imdb website 🌐 and converted into txt files.
https://scrape-ml.streamlit.app/
MIT License
79 stars 117 forks source link

Issue Report: OCR Accuracy and Error Handling in Web Scraping Script #97

Open PrashantKumar39 opened 1 month ago

PrashantKumar39 commented 1 month ago

To Recode-Hive,

I hope you're well. I've reviewed our web scraping script and identified two key areas for improvement:

  1. OCR Accuracy: Inconsistent text extraction due to varying screenshot quality and webpage content complexity.

Solution: Implement image preprocessing and optimize Tesseract configuration.

  1. Error Handling and Logging: Minimal error handling hindering script stability and debugging.

Solution: Enhance error handling with detailed logging and exception handling.

Thank you

github-actions[bot] commented 1 month ago

Thank you for raising a issue, Hope you enjoing the open source. we try to reply or assign as soon possibe. Connect with mentor.

sanjay-kv commented 1 month ago

@litesh1123 Hi, He found some area of improvement on your work, I would like to know your opinion before assigning to him.

litesh1123 commented 1 month ago

@sanjay-kv sir yes these issues are genuine and hope this can increase accuracy and optimize it thank you @PrashantKumar39 for raising issue would love to see your contribution in optimizing ocr extrcation

sanjay-kv commented 1 month ago

Can you be a collaborator and get github collab badge? I will add you into the issue as well.

litesh1123 commented 1 month ago

@sanjay-kv yes sir it would be great , i would like to collab

github-actions[bot] commented 6 days ago

This issue has been automatically closed because it has been inactive for more than 30 days. If you believe this is still relevant, feel free to reopen it or create a new one. Thank you!

litesh1123 commented 6 days ago

@sanjay-kv sir if possible please open this issue Pr i have done has conflicts , I will resolve and do pr

litesh1123 commented 4 days ago

@sanjay-kv thank you sir