COSIMA / cosima-cookbook

Framework for indexing and querying ocean-sea ice model output.
https://cosima-recipes.readthedocs.io/en/latest/
Apache License 2.0
58 stars 25 forks source link

need a function to remove non-existent files from database #81

Closed aekiss closed 5 years ago

aekiss commented 6 years ago

Here's something Clothilde discovered at the tutorial yesterday:

import cosima_cookbook as cc
cc.build_index()
cc.get_nc_variable('mom-sis_jra-ryf','ocean_month.nc','eta_t')

yields

FileNotFoundError: [Errno 2] No such file or directory: b'/g/data3/hh5/tmp/cosima/mom-sis/mom-sis_jra-ryf/output021/ocean_month.nc'

The directory /g/data3/hh5/tmp/cosima/mom-sis/mom-sis_jra-ryf/output021 doesn't exist but is indexed in the database. Apparently somebody removed the directory but it has stayed in the database because build_index() only looks for unindexed run directories to add to the database.

This scenario is probably rare, but it would nevertheless be good to have a clean_index function to removed nonexistent runs from the database. Currently the only way to fix it is to trash the DB and rebuild from scratch, which becomes prohibitively slow as the amount of data increases.

aidanheerdegen commented 6 years ago

Should definitely add some sort of checking function to ensure the integrity of the index, not sure how often it would be called, would depend a lot on how expensive it was.

In any case, also need a function to remove entries from the index.

The file open statements here:

https://github.com/OceansAus/cosima-cookbook/blob/master/cosima_cookbook/netcdf_index.py#L352

and here

https://github.com/OceansAus/cosima-cookbook/blob/master/cosima_cookbook/netcdf_index.py#L352

should be wrapped in try/except and maybe delete entries from the index if they no longer exist?

aekiss commented 6 years ago

Ah that's a good solution to only do it on an error. Just provide a warning, fix the index, and move on.

aekiss commented 6 years ago

how about a function unindex_file(filepath) # unindex a specific file path (only if it doesn't exist) which is called with the specific missing path on failed file open attempts and is also called by a function clean_index(path) # recursively unindex all nonexistent files in tree below path that walks a directory tree and so be used to clean up part or all of DB

aidanheerdegen commented 6 years ago

We definitely need some tests to run this against.

aidanheerdegen commented 6 years ago

James set up a stub to use nosetests but I much prefer py.test. Any objections to changing to py.test?

jmunroe commented 6 years ago

Py.test is just fine with me.

On Wed, May 9, 2018 at 9:07 PM Aidan Heerdegen notifications@github.com wrote:

James set up a stub to use nosetests but I much prefer py.test. Any objections to changing to py.test?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OceansAus/cosima-cookbook/issues/81#issuecomment-387906559, or mute the thread https://github.com/notifications/unsubscribe-auth/AF5Suz5R2-qIW20djfRgc6AUUfJWb00Fks5tw326gaJpZM4T3rJh .

-- JAMES MUNROE | ASSOCIATE PROFESSOR

Department of Physics and Physical Oceanography Memorial University of Newfoundland 230 Elizabeth Avenue St. John’s, Newfoundland, Canada A1C 5S7 Chemistry and Physics Building | Room C 4060 T 709 864 7362 | M 709 771 0450

www.physics.mun.ca