Closed glesica closed 4 years ago
TODO: I hit this when I ran the extractor without years (or with invalid years), wanna cut a ticket to set a default/say a better thing?
➜ extractor git:(all-the-changes) ✗ pipenv run python -m extractor
Warning: Your Pipfile requires python_version 3.8, but you are using 3.7.3 (/Users/smai/.local/share/v/e/bin/python).
$ pipenv --rm and rebuilding the virtual environment may resolve the issue.
$ pipenv check will surely fail.
WARNING:extractor.downloader:download failed '404'
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/smai/nonprofit-data/extractor/extractor/__main__.py", line 16, in <module>
app.run(options)
File "/Users/smai/nonprofit-data/extractor/extractor/app.py", line 13, in run
index = Index.from_years(options.years, index_downloader)
File "/Users/smai/nonprofit-data/extractor/extractor/index.py", line 94, in from_years
for index_file in index_downloader.fetch_all(years):
File "/Users/smai/nonprofit-data/extractor/extractor/downloader.py", line 79, in fetch_all
yield self.fetch(document)
File "/Users/smai/nonprofit-data/extractor/extractor/downloader.py", line 72, in fetch
raise DownloaderException(response.text)
extractor.downloader.DownloaderException: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>index_2020.csv</Key><RequestId>BBDDE13D3633D158</RequestId><HostId>aTU0+12LGMPrzluCcfhF9s+9d0/U1SbqeurNjZGQAAcmO4ILS3mz2ZM4xam5HQw7reYw0akgHSE=</HostId></Error>
Possible couple other future improvements: 1. ctrl + C does not kill (do you have to do something special to kill pipenv processes?), 2. when killed, any info that's been extracted before the kill is not cached. Unless I'm missing something, it may be a nice future improvement to cache as you go in case a download is interrupted or someone trips on a non-magsafe cable or something.
Also at some point, would you mind adding an example_filters.json file for fast reference?
QA +1 new fancy make targets and new fancy data extracting works for me - make format
changed some files FYI :)
+1
I upgraded to Python 3.8 so you may want to update your stuff. You can do pipenv --rm
and then pipenv install --dev
in a new shell to switch. After that pipenv run python --version
should show 3.8.
Issues have been created. I'm gonna merge it.
So many changes...
Add index fetching, improve downloader and caching, improve tooling configurations, add some more tests, add more command line options and improve default behavior, and lots of other junk.