When the remote file type changes e.g. from a .csv to .xlsx but all the ids stay the same we end up trying to write a new entry for the cache of the file, this isn't possible as the cache database's constraint says that all the output json files should be unique.
Note: file changed from csv to xlsx
In the cache database:
sqlite> select * from cache where json_file='a003W000007E8cEQAS.json';
a003W000007E8cEQAS.csv|0e0388205cb0be7e6eac0f661b3fd06ca408a8c8|a003W000007E8cEQAS.json
(.ve) datastore-test@360datastore:~/datastore$ datagetter.py --publishers 360G-linbury
Remove existing directory? data y/n: y
Downloading 360Giving Schema...
Schema Download successful.
Fetching https://linburytrust.org.uk/wp-content/uploads/2023/08/TheLinburyTrust_GB-CHC-287077.xlsx
Running convert on data/original/a003W000007E8cEQAS.xlsx to data/json_all/a003W000007E8cEQAS.json
Unflattening failed for file data/original/a003W000007E8cEQAS.xlsx
UNIQUE constraint failed: cache.json_file
Traceback (most recent call last):
File "/home/datastore-test/datastore/.ve/src/datagetter/getter/get.py", line 237, in fetch_and_convert
cache.update_cache(
File "/home/datastore-test/datastore/.ve/src/datagetter/getter/cache.py", line 64, in update_cache
cur.execute(
sqlite3.IntegrityError: UNIQUE constraint failed: cache.json_file
Short term fix for this is to simply delete the cache entry.
A secondary issue here is that this cache error happens in the same part of the code which handles unflattening (which is what the cache is for) so when an exception is triggered this gets interpreted as a unflattening problem and therefore a problem with the validity of the data itself. Improving the error handling here would be good to aid any future investigations.
When the remote file type changes e.g. from a
.csv
to.xlsx
but all the ids stay the same we end up trying to write a new entry for the cache of the file, this isn't possible as the cache database's constraint says that all the output json files should be unique.Note: file changed from csv to xlsx
In the cache database:
Short term fix for this is to simply delete the cache entry.
A secondary issue here is that this cache error happens in the same part of the code which handles unflattening (which is what the cache is for) so when an exception is triggered this gets interpreted as a unflattening problem and therefore a problem with the validity of the data itself. Improving the error handling here would be good to aid any future investigations.