MacHu-GWU / uszipcode-project

USA zipcode programmable database, includes up-to-date census and geometry information.
MIT License
231 stars 49 forks source link

uszipcode silently redownloads the database in case of corruption #55

Closed strugee closed 2 years ago

strugee commented 3 years ago

Describe the bug

If the dataset is corrupted in some way, uszipcode will silently redownload it.

To Reproduce

Steps to reproduce the behavior:

  1. In a Python console, run import uszipcode; uszipcode.SearchEngine() to trigger a download
  2. Run echo > path/to/simple_db.sqlite to truncate the file
  3. Run step 1 again
  4. Notice how the file is redownloaded, with no mention of the corrupted file

Expected behavior

At the very least a warning should be logged to console. I think probably this should throw an exception though. The current behavior can be really confusing in environments where this file is vendored into a deploy artifact, but something goes wrong with the deploy. (In our case it was that Git LFS wasn't configured, so we got the "pointer" file instead of the actual database.) That's why I think it's better to just error out: it makes it much, much easier to notice the problem.

Screenshots

N/A

Additional context

N/A

MacHu-GWU commented 2 years ago

@strugee I just released new version 1.0.1. Now it support three: db_file_path, download_url, engine parameters for SearchEngine that can solve this problem.

In your case, you could create the sqlalchemy.engine.Engine yourself (See this example), and pass in as an parameter SearchEngine(engine=engine)

So you can manage your db file your self, or event load the data to a MySQL / Postgres database. See this example

strugee commented 2 years ago

This is awesome @MacHu-GWU. Thank you for your hard work on this library, I really appreciate it :-)