Open NowanIlfideme opened 7 months ago
Later on, we can create a command line josh scrape theprotocol
(maybe with params). And then expand to a similar interface for other sites (though the scrapers will be different).
First get something in dict formats, later we can work on more structured metadata. ;)
Create a proof-of-concept for scraping the website theprotocol.it from Python.
Here are some possible libraries to use:
I've worked with bs4 but scrapy seems like something to look at too.
You could put the POC code somewhere in the package or alongside it. Options:
src/josh/scrapers/theprotocol.py
poc/theprotocol.py
and just create a first script that works.nb/scrape_theprotocol.ipynb
so you have outputs to share.