Closed LDiazN closed 3 years ago
no need to push .lock file, since it can be generated with poetry install or update.
no need to push .lock file, since it can be generated with poetry install or update.
I think we should (purposely) use lock files. After all, they represent the latest successful build of dependencies of the project. The lock file is safer since it is more specific than just a version number and, therefore, we should always commit it.
Just a reminder: whenever the PR is ready for review, remove its draft status, and (I think) it will appear on my Github notification dashboard.
no need to push .lock file, since it can be generated with poetry install or update.
I think we should (purposely) use lock files. After all, they represent the latest successful build of dependencies of the project. The lock file is safer since it is more specific than just a version number and, therefore, we should always commit it.
Agree with this, lock files are usually committed :)
Important Changes
scrapy
as dependencybeutifulsoup4
as dependencyscraper
module inc4v-py/src
Problem
Create a toy scraper version to get basic data for El Pitazo, also being able to expand it with more implementations for other sources.
Proposed solution
scrape(url : str) -> ScrapedData
that receives an url, and scrapes data for that url if possible, raise an error otherwhise.spiders.py
file with valid implementations of scrapers for different sources (There's currently just one, for El Pitazo)settings.py
file with mappings from domains to spiders, so we can choose which implementation works for each, defering domain-level logic to that spider.Tasks