ZNClub-PA-ML-AI / Scrapy-Spiders

Web Crawling using Scrapy
3 stars 5 forks source link

Returns 1st item and later empty items #2

Open ZNevzz opened 7 years ago

ZNevzz commented 7 years ago

When crawling on http://www.moneycontrol.com/news/all-news-All-1-next-0.html
the xpath used is mentioned at https://github.com/ZNClub-PA-ML-AI/DataSets#moneycontrol
it works for only the 1st item, and rest it gives empty lists

ZNevzz commented 7 years ago
  1. Checked if moneycontrol or website is blocking crawling, but it does not.
  2. There is a javascript line between 1st and rest elements of list. Checked if that was the reason for issue, but it is just a googletag function.
  3. Checked with stackoverflow, found that AJAX responses which require cookies create problems for crawlers. Links: http://stackoverflow.com/questions/40444957/scrapy-returning-empty-list-for-xpath
    http://stackoverflow.com/questions/31094615/xpath-locates-html-element-correctly-in-console-but-returns-empty-array-when-use?rq=1
ZNevzz commented 7 years ago

https://www.codementor.io/codementorteam/tutorials/how-to-scrape-an-ajax-website-using-python-qw8fuitvi