MacHu-GWU / uszipcode-project

USA zipcode programmable database, includes up-to-date census and geometry information.
MIT License
231 stars 49 forks source link

Hosted simple database link not working #40

Closed amrawadk closed 3 years ago

amrawadk commented 4 years ago

Describe the bug

Thanks for the great library : ) I had some test cases that were using this library to generate real zip codes, those tests started failing today with a strange error message:

sqlalchemy.exc.DatabaseError: (sqlite3.DatabaseError) file is not a database

After some debugging, turns out the hosted simple database url, that is downloaded in this line in the code is not working. hence the queries on the simple database fails.

To Reproduce

Steps to reproduce the behavior:

  1. Run the following lines in a python shell or file:
    
    from uszipcode import SearchEngine

search = SearchEngine(simple_zipcode=True) zip_code = search.by_zipcode(zipcode='12345')


This will produce the following exception:

Start downloading data for simple zipcode database, total size 9MB ... 1 MB finished ... Complete! Traceback (most recent call last): File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context cursor, statement, parameters, context File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute cursor.execute(statement, parameters) sqlite3.DatabaseError: file is not a database

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "repro.py", line 4, in zip_code = search.by_zipcode(zipcode='12345') File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/uszipcode/search.py", line 575, in by_zipcode zipcode_type=zipcode_type, File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/uszipcode/search.py", line 549, in query return q.limit(returns).all() File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3341, in all return list(self) File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3503, in iter return self._execute_and_instances(context) File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3528, in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1014, in execute return meth(self, multiparams, params) File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement distilled_params, File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1318, in _execute_context e, statement, parameters, cursor, context File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1512, in _handle_dbapi_exception sqlalchemy_exception, with_traceback=excinfo[2], from=e File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_ raise exception File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context cursor, statement, parameters, context File "/home/pivot/Desktop/pivot/Papi/env/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.DatabaseError: (sqlite3.DatabaseError) file is not a database [SQL: SELECT simple_zipcode.zipcode AS simple_zipcode_zipcode, simple_zipcode.zipcode_type AS simple_zipcode_zipcode_type, simple_zipcode.major_city AS simple_zipcode_major_city, simple_zipcode.post_office_city AS simple_zipcode_post_office_city, simple_zipcode.common_city_list AS simple_zipcode_common_city_list, simple_zipcode.county AS simple_zipcode_county, simple_zipcode.state AS simple_zipcode_state, simple_zipcode.lat AS simple_zipcode_lat, simple_zipcode.lng AS simple_zipcode_lng, simple_zipcode.timezone AS simple_zipcode_timezone, simple_zipcode.radius_in_miles AS simple_zipcode_radius_in_miles, simple_zipcode.area_code_list AS simple_zipcode_area_code_list, simple_zipcode.population AS simple_zipcode_population, simple_zipcode.population_density AS simple_zipcode_population_density, simple_zipcode.land_area_in_sqmi AS simple_zipcode_land_area_in_sqmi, simple_zipcode.water_area_in_sqmi AS simple_zipcode_water_area_in_sqmi, simple_zipcode.housing_units AS simple_zipcode_housing_units, simple_zipcode.occupied_housing_units AS simple_zipcode_occupied_housing_units, simple_zipcode.median_home_value AS simple_zipcode_median_home_value, simple_zipcode.median_household_income AS simple_zipcode_median_household_income, simple_zipcode.bounds_west AS simple_zipcode_bounds_west, simple_zipcode.bounds_east AS simple_zipcode_bounds_east, simple_zipcode.bounds_north AS simple_zipcode_bounds_north, simple_zipcode.bounds_south AS simple_zipcode_bounds_south FROM simple_zipcode WHERE simple_zipcode.zipcode = ? LIMIT ? OFFSET ?] [parameters: ('12345', 1, 0)] (Background on this error at: http://sqlalche.me/e/13/4xp6)



**Expected behavior**

1. The download should be validated, and if the download fails, there should be a clear error message, currently the html content of the timeout page are saved as `~/.uszipcode/simple_db.sqlite`

2. Perhaps the file should be moved to a more reliable hosting service? or maybe included in the repo? we could also update the docs to easily highlight how users can include it manually in their projects, or is there a better solution?

I can try to make time to work on a PR if needed.
amrawadk commented 4 years ago

I think a similar issue is already being addressed in #30, but I think the 2 suggestions made here would still be valiable. the MR hasn't been merged for almost a year though ... let me know if the library is still being maintained.

evladik commented 4 years ago

Does anyone know where I can find an alternative location for the database? as mentioned here datahub.io is down.

amrawadk commented 4 years ago

@evladik, Here's a local version I had: simple_db.log

Note: Just rename simple_db.log to simple_db.sqlite, I had to change the extension as github doesn't allow binary file uploads. To integrate it, use the db_file_dir argument when creating a SearchEngine, like so:

zipcode_db_dir = os.path.join(os.path.dirname(__file__), ".uszipcode") # replace with your directory
search = SearchEngine(simple_zipcode=True, db_file_dir=zipcode_db_dir)
evladik commented 4 years ago

@AmrAwadK Thank you very much! I was just about of looking for an alternative in census.gov

amrawadk commented 4 years ago

@evladik happy to help : )

MacHu-GWU commented 4 years ago

@evladik @AmrAwadK Sorry I was busy on maintaining something else. This library is still under maintaining. Since the 2020 Census coming, I am crawling data at this moment and preparing for the next big version of release.

BTW, I merged the #30 , which allows user to cache the database file anywhere they want.

amrawadk commented 4 years ago

That's great @MacHu-GWU : ) Thanks for the help :+1: I think a new pypi release would be needed to pick up the merged MR, correct?

If you think my suggestions here would add value, I'm happy to create a PR. Also any suggestions on where the db should live if datahub.io is not as reliable? maybe it could just be in the repo?

Richtermeister commented 4 years ago

Thank you for the quick reaction on this. Unfortunately my current project relies on the complete db file (the 450MB one). Is there any place to obtain that one still? Thank you for all your help!

stefanuddenberg commented 4 years ago

@evladik, Here's a local version I had: simple_db.log

Note: Just rename simple_db.log to simple_db.sqlite, I had to change the extension as github doesn't allow binary file uploads. To integrate it, use the db_file_dir argument when creating a SearchEngine, like so:

zipcode_db_dir = os.path.join(os.path.dirname(__file__), ".uszipcode") # replace with your directory
search = SearchEngine(simple_zipcode=True, db_file_dir=zipcode_db_dir)

It would be extremely helpful to have the more complete db available as well!

Richtermeister commented 4 years ago

I was able to obtain a cached copy from a colleague, in case I can provide it to this project.. Let me know.

MacHu-GWU commented 4 years ago

@evladik @AmrAwadK @stefanuddenberg Hi folks, let me find the original file and hosted it on AWS S3, it will be more stable

MacHu-GWU commented 3 years ago

@evladik @AmrAwadK @stefanuddenberg fixed in new release 0.2.6