CarterPape / NewsBot

A bot that runs journalistic tasks, such as web scraping
GNU General Public License v3.0
0 stars 0 forks source link

Replace the `requests` dependency with `scrapy.mail` #5

Open CarterPape opened 3 weeks ago

CarterPape commented 3 weeks ago

There is an opportunity to replace the sole dependency on requests in item_emailer.py with scrapy.mail.

As documented in an issue on the Scrapy project, my previous foibles with scrapy.mail were the result of failing to see that scrapy.mail.MailSender().send returns a twisted.internet.deferred.Deferred (because this behavior is not documented).

An item pipeline can return a Deferred, though I don't see documented what happens when this is the case. Does the Deferred get immediately passed to the next item pipeline? Is the next pipeline called only after the Deferred finishes? Either way, how is the item passed along to the next pipeline?

Once I test this and find out, I could add both of these items to the Scrapy documentation.

CarterPape commented 3 weeks ago

Based on how the Scrapy project tests pipelines that return Deferreds, the callback for the Deferred should return the item. That suggests that the next pipeline is not invoked until the returned Deferred finishes.

CarterPape commented 3 weeks ago

Also: Duh, the process_item method is expected to take an item as an argument — no mention in the documentation of accepting a Deferred.