Closed aekiss closed 5 years ago
Should definitely add some sort of checking function to ensure the integrity of the index, not sure how often it would be called, would depend a lot on how expensive it was.
In any case, also need a function to remove entries from the index.
The file open statements here:
https://github.com/OceansAus/cosima-cookbook/blob/master/cosima_cookbook/netcdf_index.py#L352
and here
https://github.com/OceansAus/cosima-cookbook/blob/master/cosima_cookbook/netcdf_index.py#L352
should be wrapped in try/except
and maybe delete entries from the index if they no longer exist?
Ah that's a good solution to only do it on an error. Just provide a warning, fix the index, and move on.
how about a function
unindex_file(filepath) # unindex a specific file path (only if it doesn't exist)
which is called with the specific missing path on failed file open attempts
and is also called by a function
clean_index(path) # recursively unindex all nonexistent files in tree below path
that walks a directory tree and so be used to clean up part or all of DB
We definitely need some tests to run this against.
James set up a stub to use nosetests
but I much prefer py.test
. Any objections to changing to py.test
?
Py.test is just fine with me.
On Wed, May 9, 2018 at 9:07 PM Aidan Heerdegen notifications@github.com wrote:
James set up a stub to use nosetests but I much prefer py.test. Any objections to changing to py.test?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OceansAus/cosima-cookbook/issues/81#issuecomment-387906559, or mute the thread https://github.com/notifications/unsubscribe-auth/AF5Suz5R2-qIW20djfRgc6AUUfJWb00Fks5tw326gaJpZM4T3rJh .
-- JAMES MUNROE | ASSOCIATE PROFESSOR
Department of Physics and Physical Oceanography Memorial University of Newfoundland 230 Elizabeth Avenue St. John’s, Newfoundland, Canada A1C 5S7 Chemistry and Physics Building | Room C 4060 T 709 864 7362 | M 709 771 0450
www.physics.mun.ca
Here's something Clothilde discovered at the tutorial yesterday:
yields
The directory
/g/data3/hh5/tmp/cosima/mom-sis/mom-sis_jra-ryf/output021
doesn't exist but is indexed in the database. Apparently somebody removed the directory but it has stayed in the database becausebuild_index()
only looks for unindexed run directories to add to the database.This scenario is probably rare, but it would nevertheless be good to have a
clean_index
function to removed nonexistent runs from the database. Currently the only way to fix it is to trash the DB and rebuild from scratch, which becomes prohibitively slow as the amount of data increases.