Rewrite test_datebase.py tests as a validation job

jsstevenson commented 1 month ago

Many of the cases in test_database.py query an existing database to check that certain rules of the model structure are being obeyed. This is helpful, but I'm not sure a pytest case is the best way to employ this. In practice, I think we'd want to have a scheduleable (+ ad-hoc runnable) task or script that runs this over a Neo4j table, and collects a listing of deviations from rules (+ then saves or emails them). It's less helpful to have them as a test because they aren't testing behavior (that eg could've just been changed by a recent commit), they're testing state, which may or may not even be available.

korikuzma commented 1 month ago

Is this similar to #261 ?

jsstevenson commented 1 month ago

@korikuzma I think these are sort of orthogonal questions. The tests in test_database are helpful for checking for schema drift over time in a production setting. As they're written, though, they aren't checking correctness of the data upload methods, at least not in a particularly direct or robust way -- just that nothing that's in there is shaped incorrectly. To do that I think we would want to either a) mock DB calls and check correctness there or b) create a test dataset, have a test module manage an upload run of it to a test table, then dump it out and check correctness of the dump (I think this is better suited for CICD if anything since Neo4j Desktop only lets you run one table at once)

cancervariants / metakb

Rewrite test_datebase.py tests as a validation job #345