alephdata / memorious

Lightweight web scraping toolkit for documents and structured data.
https://docs.alephdata.org/developers/memorious
MIT License
311 stars 59 forks source link

FIX load_session, login in quote_scraper #98

Closed moreymat closed 4 years ago

moreymat commented 4 years ago

This PR fixes session loading and the "extended_web_scraper" (quotes) example.

Session loading was broken because:

Concretely, it means that any successful authentication in the login step in a crawler persists but cannot be retrieved by the following steps (the session and its cookies are never found because the key used to load the session is wrong).

This bug was silently affecting the "quotes" example aka extended_web_scraper. In addition, authentication was not successful in the first place because the POST request sent by quotes:login was missing a hidden input from the login form. The website is crawled and scraped but the scraped pages don't have the "(Goodreads page)" links that only appear if you're logged in.

sunu commented 4 years ago

Thanks for the PR Mathieu!