code-for-montana / nonprofit-data

A tool to help make data about nonprofit organization more accessible.
MIT License
5 stars 2 forks source link

Make all the changes #68

Closed glesica closed 4 years ago

glesica commented 4 years ago

So many changes...

Add index fetching, improve downloader and caching, improve tooling configurations, add some more tests, add more command line options and improve default behavior, and lots of other junk.

smai-f commented 4 years ago

TODO: I hit this when I ran the extractor without years (or with invalid years), wanna cut a ticket to set a default/say a better thing?

➜  extractor git:(all-the-changes) ✗ pipenv run python -m extractor
Warning: Your Pipfile requires python_version 3.8, but you are using 3.7.3 (/Users/smai/.local/share/v/e/bin/python).
  $ pipenv --rm and rebuilding the virtual environment may resolve the issue.
  $ pipenv check will surely fail.
WARNING:extractor.downloader:download failed '404'
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/smai/nonprofit-data/extractor/extractor/__main__.py", line 16, in <module>
    app.run(options)
  File "/Users/smai/nonprofit-data/extractor/extractor/app.py", line 13, in run
    index = Index.from_years(options.years, index_downloader)
  File "/Users/smai/nonprofit-data/extractor/extractor/index.py", line 94, in from_years
    for index_file in index_downloader.fetch_all(years):
  File "/Users/smai/nonprofit-data/extractor/extractor/downloader.py", line 79, in fetch_all
    yield self.fetch(document)
  File "/Users/smai/nonprofit-data/extractor/extractor/downloader.py", line 72, in fetch
    raise DownloaderException(response.text)
extractor.downloader.DownloaderException: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>index_2020.csv</Key><RequestId>BBDDE13D3633D158</RequestId><HostId>aTU0+12LGMPrzluCcfhF9s+9d0/U1SbqeurNjZGQAAcmO4ILS3mz2ZM4xam5HQw7reYw0akgHSE=</HostId></Error>
smai-f commented 4 years ago

Possible couple other future improvements: 1. ctrl + C does not kill (do you have to do something special to kill pipenv processes?), 2. when killed, any info that's been extracted before the kill is not cached. Unless I'm missing something, it may be a nice future improvement to cache as you go in case a download is interrupted or someone trips on a non-magsafe cable or something.

smai-f commented 4 years ago

Also at some point, would you mind adding an example_filters.json file for fast reference?

smai-f commented 4 years ago

QA +1 new fancy make targets and new fancy data extracting works for me - make format changed some files FYI :)

smai-f commented 4 years ago

+1

glesica commented 4 years ago

I upgraded to Python 3.8 so you may want to update your stuff. You can do pipenv --rm and then pipenv install --dev in a new shell to switch. After that pipenv run python --version should show 3.8.

glesica commented 4 years ago

Issues have been created. I'm gonna merge it.