Allama272 / OLX-Mining-and-analyzing-house-prices

Automatically scraps and collects olx apartment data every 15 days
5 stars 0 forks source link

Scraping other olx sites #7

Open delighttechnology opened 1 year ago

delighttechnology commented 1 year ago

Hi, Just a question, is it possible to scrap olx.bg, olx.ua, olx.pl or other with this scrapper?

Allama272 commented 1 year ago

Hi, Just a question, is it possible to scrap olx.bg, olx.ua, olx.pl or other with this scrapper?

Since every olx site has slightly different formatting/class names, you would need to change some parts of the code like the CSS class names for the price finder, bedrooms, area, etc... Additionally, olx. pl and the others do not have the number of bedrooms in the main grid listing page so you would need to visit each listing URL to get the number of bedrooms. This will therefore significantly impact the time needed to scrap each page. Rather than requesting 100 main pages, it will need to visit (100 x number of listing on each page ~50)= 5000. unless you are willing to omit such features.

Shouldn't take much time to make those changes though, If you are interested I can make another branch for those sites.

delighttechnology commented 1 year ago

@Allama272 This would be super cool if you do so. I am currently looking for an apartment to buy on bank loan and want to analyze possible options on olx.pl . I hope that I could find something cheaper instead of looking for an apartment through realtors. At the end I want to create some kind of Power BI dashboard to follow the prices trends. I can share my dashboard after all :)

Allama272 commented 1 year ago

@delighttechnology I was actually thinking of creating a dashboard for this project when I get the chance. As in the scrapper itself, as I explained earlier a fast and reliable scrapper would only get the features on the main page: image so only the price, general address, and area

Alternatively, we can get all the other features : image by visiting each listing on each page which will take much longer.

Which option would you like? Note: I will not schedule autoruns so you will have to run this locally.

delighttechnology commented 1 year ago

@Allama272, I think for the purposes of this kind of analysis it would be beneficial to have second option with detailed listing. Regarding scheduled runs - of course, I will do it on my site and probably schedule autoruns during night. I think that first run would take much longer but then I will catch only new listings. I am interested in only one city so this limit results significantly.