Recode-Hive / Scrape-ML

For new data generation Semi-supervised-sequence-learning-Project we have writtern a python script to fetch📊, data from the 💻, imdb website 🌐 and converted into txt files.
https://scrape-ml.streamlit.app/
MIT License
80 stars 116 forks source link

Vercel WebApp #85

Open Sitevity opened 1 month ago

Sitevity commented 1 month ago

Is your feature request related to a problem? Please describe.

I'm currently working on improving Scrape-ML's ability to handle websites with dynamically loaded content. This is a common challenge because websites often use JavaScript to fetch and display content after the initial page load. Scrape-ML's current static parsing approach often misses this dynamically generated content, leading to incomplete data extraction.

Describe the solution you'd like

I propose implementing a feature that utilizes browser automation to handle dynamic content. This could be achieved by integrating with a library like Selenium or Puppeteer. These libraries allow Scrape-ML to simulate a real browser, execute JavaScript code, and wait for the dynamically loaded content to appear before parsing the page.

Describe alternatives you've considered

I've explored using Scrape-ML's existing features like custom selectors and regular expressions to target specific elements within the source code. However, this approach becomes cumbersome and unreliable for complex websites with intricate JavaScript interactions. Additionally, it requires a deep understanding of the website's underlying code, making it difficult for users who are not familiar with web development.

Additional context

Several popular web scraping frameworks utilize browser automation for handling dynamic content. This functionality has become a critical aspect of modern web scraping due to the prevalence of dynamic websites.

github-actions[bot] commented 1 month ago

Thank you for raising a issue, Hope you enjoing the open source. we try to reply or assign as soon possibe. Connect with mentor.

Sitevity commented 1 month ago

I Request You to Assign me This Feature Request under GSSOC'24 (Level 3)

AashishKumar-3002 commented 1 month ago

Hey @sanjay-kv @Sitevity if this issue is available, I would like to work on it

sanjay-kv commented 1 month ago

Its already assigned if you want to collaborate reach out to assigned person