Open fluffybeing opened 9 years ago
But in case of scraping we have to crawl through more than pages since, while scraping, after we reach the next page we need to scrap that page again.
But how many? Any approximate number ? I think requests and lxml can easily do that. you can extract all the links in href and then parse the one you want.
h its true that it can be done. But we were thinking that if in future we generalize this project and include the reviews of other e-markettinge websites as well then we will need scrapy maybe. I am not sure. We will surely see into it.
I think an appox of 100 reviews will be sufficient to calculate SS (Sentiment Score). One page has apprx 15 reviews so it amounts to 8 page hits per sentiment request. @rahulrrixe @SaptakS Decide based on data point.
Also premature optimization in my opinion is a curse. Lets focus on what gets our MVP ready first which required ease of code. If scrapy permits it then its ok to use it. Keep this issue open for future so we know where to optimize.
I think what you are doing in the project is that for every request for product review you are creating a scrapy job. This is compute heavy job and will not handle more than 10 requests at a time on 4GB ram machine. Scrapy is for crawling many pages not one page and so use
requests with lxml
for scrapping content from a single url.