Bfas237 / E-Shop-Scrapper-PRO

E-Shop Scrapper PRO is a Python script that scrapes product data from an e-commerce website and exports it to a CSV file. It utilizes web scraping techniques to extract information such as product names, categories, prices, descriptions, and images.
GNU General Public License v3.0
1 stars 3 forks source link

Changed url, got these errors, No data available. CSV file not created. #1

Open georgerabus opened 10 months ago

georgerabus commented 10 months ago
$ python e_shop_scrapper_pro.py
Scraping Pages: 0it [00:00, ?it/s]
Scraping Products: 0it [00:00, ?it/s]
No data available. CSV file not created.
Empty DataFrame
Columns: []
Index: []
javaldx: Could not find a Java Runtime Environment!
Warning: failed to read path from javaldx

also to clarify, at base_url I should include only the main url like aliexpress.com or more specific like https://www.aliexpress.com/p/calp-plus/index.html?spm=a2g0o.best.testStatic.5.32422c25ygT70g&osf=category_navigate_newTab2&queryFrom=kingKong&categoryTab=us_beauty_%26_health

Bfaschat commented 10 months ago

I will take a look at it and see what went wrong.

Expect a reply any moment from now

Psst: Thanks for the interest :)

georgerabus commented 10 months ago

Thanks to you too for still being active :)

btw can you specify more what

productlinks = []
data = []
categories = []

are these for? also they should contain strings of url separated by comma?

Bfaschat commented 10 months ago

I will love to know the branch causing the error.

I mistakenly merged the two branches (Request and Scrapy)

If you can answer then it will be much easier to know what went wrong

As for the questtion btw can you specify more what

productlinks = [] This list holds the product links of the active scrapper for further processing data = [] This is the list that holds the data to be written to CSV file categories = [] This holds the category names which will be used later for mapping products to their parent and child categories

georgerabus commented 10 months ago

productlinks = [] This list holds the product links of the active scrapper for further processing data = [] This is the list that holds the data to be written to CSV file categories = [] This holds the category names which will be used later for mapping products to their parent and child categories

oh so these lists should not be touched by me, understood. I am willing to help you out to perfect this program (win:win), but unfortunately I will go to sleep now (2:30AM), I will reply to everything tomorrow morning

Bfaschat commented 10 months ago

I will try to scrape AliExpress right now and see what went wrong. I have never tried Aliexpress before. We have almost the same timezone. Its 1:30am here,

I pray you wake up to find a good news from my end

I am also currious, did you by chance forget to change the html elements to match that of aliexpress.com? As concerns the second to the last and last lines, chatgpt gave this answer

It seems like there's an issue with Java Runtime Environment (JRE) not being properly configured or available in your system. Here are a few steps you can take to resolve this:

Verify Java Installation:

Check Java Installation: Ensure Java is installed on your system. You can do this by running java -version in your command prompt or terminal to check if Java is properly installed and the version is displayed.

Set Java Environment Variables (Windows):

Set JAVA_HOME: Set the JAVA_HOME environment variable to point to your Java installation directory.
Update PATH: Add %JAVA_HOME%\bin to your PATH variable to ensure the system can find the Java executables.

Reinstall Java:

Reinstall Java: If Java is not installed or if the installation is corrupted, consider reinstalling the latest version of Java from the official website.

Command-Line Specifics:

If this issue occurs when executing specific commands or applications, check their documentation or support resources for any specific Java-related configurations or requirements.

Bfaschat commented 10 months ago
image

I managed to get a workaround for aliexpress.com e-shop. But you should be notificed that you will need to buy proxy to get through

Their shop is embeded in a javascript so you either use selenium or a custom written API

I will have to revamp my code to include support for major e-shops

The attached screenshots show the completed scrapping progress.

I will update you when i am far gone with the modifications.

if you have other ways to make the script better, why not fork it, add the necessary changes and push

Lets keep our fingers crossed 🤞

georgerabus commented 10 months ago

I just woke up, but I'll go to bed back soon btw I'm using linux (arch), so that you know there are different audiences :))

if you have other ways to make the script better, why not fork it, add the necessary changes and push

Lets keep our fingers crossed 🤞

I am a beginner programmer so I don't know if i'll be able to do much, but thanks :) I'll try my best

edit: also yes i typed the urls with https://

Bfaschat commented 10 months ago

I will take good note of the various OS

I got you well covered.

We have a long day ahead of us

Good and wonderful morning to you dude

Bfaschat commented 9 months ago

HUGE UPDATE !!!

I want to inform you BTS my team and i have been working on a new update. We have created an API for this scrapper that allows you to call it and submit your link to be scrapped.

While we work on that, i will be pushing an update to this script that makes your scrapping process easy by elimimating the stress of always searching and replacing classes.

This may not be the update you were expecting but its better than nothing

Cheers 🥂

I haven't forgotten about your Aliexpress request...