Recode-Hive / Scrape-ML

For new data generation Semi-supervised-sequence-learning-Project we have writtern a python script to fetch📊, data from the 💻, imdb website 🌐 and converted into txt files.
https://scrape-ml.streamlit.app/
MIT License
80 stars 117 forks source link

Update scrapping.py #181

Closed Harshitmishra001 closed 3 weeks ago

Harshitmishra001 commented 3 weeks ago

Related Issue

Updating amazon_scrapping /scrapping.py for imporvements Issue #180

Description

Key Improvements:

  1. Ensures elements are loaded before interacting with them.
  2. Correctly handles the pagination and breaks the loop when there are no more pages.
  3. Uses the correct XPATH and class names.
  4. Creates a DataFrame with a column name and saves it correctly.

Type of PR

Screenshots / videos (if applicable)

Screenshot 2024-06-08 113151 Screenshot 2024-06-08 113201

Checklist:

sanjay-kv commented 3 weeks ago

i need to run and test, give me a day of time. or today eve

Harshitmishra001 commented 3 weeks ago

No problem , when I was done writing the code I actually forgot to create a issue for it earlier that's why my PR was early😅

sanjay-kv commented 3 weeks ago

image

could you help fixing this.

Harshitmishra001 commented 2 weeks ago

Hi I'm sure I can try to 😄 I will look into it after 10pm as currently I am out of station 😅

sanjay-kv commented 2 weeks ago

@Harshitmishra001 i need one help with simillar repo. would you like to connect on gmeet for the same.

Harshitmishra001 commented 2 weeks ago

@sanjay-kv Sure we can do it tomorrow whenever you are available😄

sanjay-kv commented 2 weeks ago

okay. could you please send me a linkedin request. will take it from there or give me your email ID

Harshitmishra001 commented 2 weeks ago

@sanjay-kv Hi, We are already connected on Linkedin My linkedin ID - https://www.linkedin.com/in/harshit-mishra-98329128a/ email id - hmharsh123@gmail.com