cancervariants / fusor

Data classes and constructor tools supporting the VICC Gene Fusion Curation project
MIT License
0 stars 0 forks source link

build!: upgrade to pydantic v2 #135

Closed katiestahl closed 5 months ago

katiestahl commented 8 months ago

progress on #132

katiestahl commented 8 months ago

I'm also getting some failed tests. I thought aws dev ddb was being used, so that's the one I used to run tests.

FAILED tests/test_fusor.py::test_add_gene_descriptor - AssertionError: alt labels
FAILED tests/test_fusor.py::test_gene_element - AssertionError: len of extensions
FAILED tests/test_fusor.py::test_functional_domain - AssertionError: len of extensions
FAILED tests/test_fusor.py::test_regulatory_element - AssertionError: len of extensions

oh yeah, I meant to add another comment about this. I have never had these tests pass locally due to data issues. Was going to recruit @jsstevenson to help look.

I think a few actually do need fixing, looking into those still. Just wanted comments on other things in the meantime :)

korikuzma commented 8 months ago

@jsstevenson prod ddb is not updated. We’ve been testing on dev ddb. Dev and prod should point to the same data tho

korikuzma commented 8 months ago

@jsstevenson prod ddb is not updated. We’ve been testing on dev ddb. Dev and prod should point to the same data tho

Confirmed that prod ddb and dev ddb point to same data. Stick with dev for now until we update prod.

katiestahl commented 8 months ago

I was having trouble getting the AWS connection to work (wouldn't you need to use a previous version of gene-normalizer since the current one combines the metadata and concept tables?) so I tried with a local DB copy, freshly updated today.

It looks like the test fixtures are a bit ahead of where the prod table is, update-wise, but they're behind the current version of gene-normalizer/current data. if you:

  • remove all ChromosomeLocations
  • add Strand extensions to all gene descriptors
  • update one of the expected validation error messages

then all tests pass against the most recent gene-normalizer stuff. This commit shows the specific changes: 5bdcbb6. You can see specific data versioning here: https://gist.github.com/jsstevenson/82e8d1bb71ad1ca3af66810788c2445f

The change to that expected error message should definitely go through, since it's a Pydantic v2 thing. The others, though, are coming up because we're testing specifics about the Gene Normalizer, not about how FUSOR is using it. In a perfect world, we could just test that our API calls out to external services are correct, and not worry about the particulars of their responses in tests, but this project isn't set up well to support that without a lot of labor. In the meantime, IMO, I think we should drastically reduce the number of value checks that we perform on data from external services, focusing on specific, relevant, less-likely to change items that signal correct usage -- for example, the concept ID on a normalized gene, the populated existence of fields that you wouldn't find on a minimized gene descriptor, etc. These things could still change from time to time, but with fewer checks, it'd at least be much easier to pinpoint their locations, and it'd be less work to keep them up-to-date.

**alternatively, since we're using the prod DB as a shared source of truth for test purposes, we could just go back through the gene descriptor fixtures and make sure that their fields match whatever https://normalize.cancervariants.org/gene/ says they should be

added these changes. I think everything should be addressed and taken care of now. However, the tests still fail locally for me because the database I am pointing to does not have the same data as was expected in this commit: https://github.com/cancervariants/fusor/pull/108/files#diff-e52e4ddd58b7ef887ab03c04116e676f6280b824ab7469d5d3080e5cba4f2128

Bright side: the failures are not related to the changes in this PR bad side: failing tests :(.

how should I proceed here? Additionally, with changes that will eventually be made in gene-normalizer to combine the gene concept and source tables (https://github.com/cancervariants/gene-normalization/commit/76579daa7d67f101740c5e0dfaaae8093ab193ff) what special steps will I need to take as this gets merged? Should fusion-curation just point to this fusor version until we update fusor to work with those changes (or is this even necessary? I can't remember) I can see this causing issues with fusion-curation pointing to this fusor version but then being released into production and using production dbs... I think it would be helpful if we come up with a roadmap here so we (I) don't break anything while pushing this through the pipeline.

korikuzma commented 8 months ago

I've updated prod ddb to work with the merged tables.

If you used @jsstevenson 's changes for tests, it's because he used the latest version (as of 2 days ago) and the aws instances use old versions of the sources. I'd recommend sticking with what's in prod/dev ddb instances. They will be updated "soon" (tbd how soon)