Closed aidanheerdegen closed 3 years ago
Hmm, most of the tests are passing when I run them locally. Have to take a look tomorrow.
I can reproduce these errors by upgrading to sqlalchemy=1.4.1
(current unstable
conda env has 1.3.22
). So it seems like there has been a change in behaviour.
Maybe there is something in here that rings a bell @angus-g ?
https://docs.sqlalchemy.org/en/14/changelog/migration_14.html
Sounds like it is a pretty big upgrade. Could just pin testing to 1.3 until it is sorted ...
Yeah, let's pin to 1.3 until we work out the changes needed for 1.4!
I need to add some tests, but wanted to get some feedback @angus-g. Is this a useful/good way to proceed?
Is this a useful/good way to proceed?
I think so! Still maybe a few too many flags on build_index
? I'm not sure if we need update
, prune
or delete
: it seems like updating is probably something you always want to do (and/or it's covered by force
), and I doubt anybody used the other two, plus they're better covered by dedicated functions imo.
Still maybe a few too many flags on build_index? I'm not sure if we need update, prune or delete: it seems like updating is probably something you always want to do (and/or it's covered by force), and I doubt anybody used the other two, plus they're better covered by dedicated functions imo.
I agree update
seems sort of redundant. Struggling to think of a use case where you'd call build_index
with update=False
. So I'd be up for removing it. As you say, not sure anyone uses it.
prune
also seems like a no-brainer in order to keep the DB in a non-broken state. In a way delete
is almost better as an option for prune
. As in yes always prune
by default, and that can either be set to missing or delete. Not sure which is a better default.
I've updated one of the tests to test the force
option. This fails because of the UNIQUE
constraint:
> cursor.execute(statement, parameters)
206
E sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: ncfiles.experiment_id, ncfiles.ncfile
207
E [SQL: INSERT INTO ncfiles (index_time, ncfile, present, experiment_id, time_start, time_end, frequency) VALUES (?, ?, ?, ?, ?, ?, ?)]
208
E [parameters: ('2021-03-19 04:43:03.276985', 'output000/test.nc', 0, 1, None, None, None)]
209
E (Background on this error at: http://sqlalche.me/e/13/gkpj)
I took a look but it wasn't obvious to me how to fix this. Do you have any hints @angus-g?
A sledgehammer approach could be to add the UniqueMixin
like Keyword
or CFVariable
, and then update all the indexed files with as_unique
.
I think it'd be easier and less fiddly to just query for the file in the "update all variables to be unique" loop, and delete existing matches though, since the UniqueMixin
probably isn't quite set up for a top-level object like that.
Ugh. Now build_index
is getting bigger and more unwieldy. Happy to hear if there is a more elegant way to do the deletion of existing NCFile
records @angus-g.
Thanks for the review and sqlalchemy magics @angus-g
I agree. I thought about fixing https://github.com/COSIMA/cosima-cookbook/issues/238 as well, as I think it messes a bit with the final logic used in this change, but decided to leave that to another day.
I agree
update
seems sort of redundant. Struggling to think of a use case where you'd callbuild_index
withupdate=False
. So I'd be up for removing it. As you say, not sure anyone uses it.
oh, it was used lots. :) Especially after being part of the cosima-recipes tutorial about creating custom db.
Closes #233
Logic about which files to filter and prune is moved up to build_index.
Added prune_files function which is called from prune_experiment and build_index where required.
Made a dedicated file_files function with option to specify a pattern to match when globbing.