andre-st / amazon-wishless

The more and longer wishlists you have, the less you look for buying opportunities arising from price trends. This filters your Amazon lists by (used) price and priority.
GNU General Public License v3.0
12 stars 3 forks source link

infinite scrolling not working #3

Open natea opened 2 weeks ago

natea commented 2 weeks ago

I was able to get the script to extract items from two wishlists, but when I looked at the .xml file, it appears to have only scraped the first few items on the list. maybe it's having a hard time with the infinite scroll?

Here's the output of running wishlist.sh script:

$ ./wishlist.sh
2024-11-03 17:42:16 [scrapy.utils.log] INFO: Scrapy 2.11.2 started (bot: scrapybot)
2024-11-03 17:42:16 [scrapy.utils.log] INFO: Versions: lxml 5.3.0.0, libxml2 2.12.9, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.10.0, Python 3.10.10 (main, Mar 29 2023, 14:29:38) [Clang 14.0.0 (clang-1400.0.29.202)], pyOpenSSL 24.2.1 (OpenSSL 3.3.2 3 Sep 2024), cryptography 43.0.3, Platform macOS-15.1-x86_64-i386-64bit
2024-11-03 17:42:16 [scrapy.addons] INFO: Enabled addons:
[]
2024-11-03 17:42:16 [py.warnings] WARNING: /Users/nateaune/.pyenv/versions/amazon-wishlist/lib/python3.10/site-packages/scrapy/utils/request.py:254: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.

It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.

See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation.
  return cls(crawler)

2024-11-03 17:42:16 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2024-11-03 17:42:16 [scrapy.extensions.telnet] INFO: Telnet Password: 8cdbd10aa4bb6d45
2024-11-03 17:42:16 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2024-11-03 17:42:16 [scrapy.crawler] INFO: Overridden settings:
{'DOWNLOAD_DELAY': 0.25,
 'SPIDER_LOADER_WARN_ONLY': True,
 'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, '
               'like Gecko) Chrome/27.0.1453.93 Safari/537.36'}
2024-11-03 17:42:16 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2024-11-03 17:42:16 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2024-11-03 17:42:16 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2024-11-03 17:42:16 [scrapy.core.engine] INFO: Spider opened
2024-11-03 17:42:16 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-11-03 17:42:16 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-11-03 17:42:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/hz/wishlist/ls/2812PC7N0O90P> (referer: None)
2024-11-03 17:42:17 [wishlist] DEBUG: Wishlist 'b'Kindle books'' (10 items, atm)
2024-11-03 17:42:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/hz/wishlist/ls/2L5ORNL8TA3A3> (referer: None)
2024-11-03 17:42:18 [wishlist] DEBUG: Wishlist 'b'Biz/sales/marketing books'' (10 items, atm)
2024-11-03 17:42:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/hz/wishlist/slv/items?filter=unpurchased&paginationToken=eyJGcm9tVVVJRCI6ImJjNDE4MWYyLWFiOTgtNDFlYS1hMTNmLTE2ZDczMzY2OWM5NSIsIlRvVVVJRCI6Ijk2ZDRkMGJmLWRjZWEtNDc2NC04NTc1LTE2MzZiMDY5ZDI0YiIsIkVkZ2VSYW5rIjoyNDQ1NDM3fQ&itemsLayout=LIST&sort=default&type=wishlist&lid=2812PC7N0O90P> (referer: https://www.amazon.com/hz/wishlist/ls/2812PC7N0O90P)
2024-11-03 17:42:18 [wishlist] DEBUG: Wishlist 'b'Kindle books'' (20 items, atm)
2024-11-03 17:42:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/hz/wishlist/slv/items?filter=unpurchased&paginationToken=eyJGcm9tVVVJRCI6ImJjNDE4MWYyLWFiOTgtNDFlYS1hMTNmLTE2ZDczMzY2OWM5NSIsIlRvVVVJRCI6Ijk2YWVhOGY0LTQ1ODItNGIzMC04MGEwLWQzZGNlMjIxZDU2ZSIsIkVkZ2VSYW5rIjoyMzg2MjI2fQ&itemsLayout=LIST&sort=default&type=wishlist&lid=2812PC7N0O90P> (referer: https://www.amazon.com/hz/wishlist/slv/items?filter=unpurchased&paginationToken=eyJGcm9tVVVJRCI6ImJjNDE4MWYyLWFiOTgtNDFlYS1hMTNmLTE2ZDczMzY2OWM5NSIsIlRvVVVJRCI6Ijk2ZDRkMGJmLWRjZWEtNDc2NC04NTc1LTE2MzZiMDY5ZDI0YiIsIkVkZ2VSYW5rIjoyNDQ1NDM3fQ&itemsLayout=LIST&sort=default&type=wishlist&lid=2812PC7N0O90P)
2024-11-03 17:42:18 [wishlist] DEBUG: Wishlist 'b'Kindle books'' (20 items, atm)
2024-11-03 17:42:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/hz/wishlist/slv/items?filter=unpurchased&paginationToken=eyJGcm9tVVVJRCI6IjNlN2NlYjczLTQ4YTgtNDlkMS1iMzJiLWM5ZWNjNDhjNjAxMiIsIlRvVVVJRCI6IjIxYTJhMzkxLWJiOGItNGJkOC05NmFjLTZlZWFhZDIwODJlMCIsIkVkZ2VSYW5rIjotMTF9&itemsLayout=LIST&sort=default&type=wishlist&lid=2L5ORNL8TA3A3> (referer: https://www.amazon.com/hz/wishlist/ls/2L5ORNL8TA3A3)
2024-11-03 17:42:19 [wishlist] DEBUG: Wishlist 'b'Biz/sales/marketing books'' (20 items, atm)
2024-11-03 17:42:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/hz/wishlist/slv/items?filter=unpurchased&paginationToken=eyJGcm9tVVVJRCI6IjNlN2NlYjczLTQ4YTgtNDlkMS1iMzJiLWM5ZWNjNDhjNjAxMiIsIlRvVVVJRCI6IjM1MmZkZjRkLTlkYTctNGNlZS04NzAzLWI5MDkwZDZjZmViYyIsIkVkZ2VSYW5rIjotMjF9&itemsLayout=LIST&sort=default&type=wishlist&lid=2L5ORNL8TA3A3> (referer: https://www.amazon.com/hz/wishlist/slv/items?filter=unpurchased&paginationToken=eyJGcm9tVVVJRCI6IjNlN2NlYjczLTQ4YTgtNDlkMS1iMzJiLWM5ZWNjNDhjNjAxMiIsIlRvVVVJRCI6IjIxYTJhMzkxLWJiOGItNGJkOC05NmFjLTZlZWFhZDIwODJlMCIsIkVkZ2VSYW5rIjotMTF9&itemsLayout=LIST&sort=default&type=wishlist&lid=2L5ORNL8TA3A3)
2024-11-03 17:42:19 [wishlist] DEBUG: Wishlist 'b'Biz/sales/marketing books'' (20 items, atm)
2024-11-03 17:42:19 [scrapy.core.engine] INFO: Closing spider (finished)
2024-11-03 17:42:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 4022,
 'downloader/request_count': 6,
 'downloader/request_method_count/GET': 6,
 'downloader/response_bytes': 290128,
 'downloader/response_count': 6,
 'downloader/response_status_count/200': 6,
 'elapsed_time_seconds': 2.945415,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2024, 11, 3, 22, 42, 19, 681779, tzinfo=datetime.timezone.utc),
 'httpcompression/response_bytes': 1382569,
 'httpcompression/response_count': 6,
 'log_count/DEBUG': 13,
 'log_count/INFO': 10,
 'log_count/WARNING': 1,
 'memusage/max': 61566976,
 'memusage/startup': 61566976,
 'request_depth_max': 2,
 'response_received_count': 6,
 'scheduler/dequeued': 6,
 'scheduler/dequeued/memory': 6,
 'scheduler/enqueued': 6,
 'scheduler/enqueued/memory': 6,
 'start_time': datetime.datetime(2024, 11, 3, 22, 42, 16, 736364, tzinfo=datetime.timezone.utc)}
2024-11-03 17:42:19 [scrapy.core.engine] INFO: Spider closed (finished)
andre-st commented 5 days ago

Sorry for the late reply. The current version in the repo should solve the problem.