PoonLab / covizu

Rapid analysis and visualization of coronavirus genome variation
https://filogeneti.ca/CoVizu/
MIT License
45 stars 20 forks source link

Re-run cluster analyses only for lineages that have changed #493

Closed ArtPoon closed 4 months ago

ArtPoon commented 10 months ago

We should not be rebuilding trees (clusters) for lineages that have not accumulated any new data since the last time the pipeline was run. This should save a lot of time because I expect there to be many large lineages without any new circulating infections. It should be possible to determine which lineages have changed with the next data update from the local database (see #485).

ArtPoon commented 7 months ago

We should be able to start working on this when #501 is merged

ArtPoon commented 7 months ago

We can generate a list of lineages that have new records while streaming the provision JSON file and comparing accession numbers against the local database. However, we are not currently storing clustering results in a database, which would prevent us from retrieving those results for lineages that have not been updated.

ArtPoon commented 7 months ago
GopiGugan commented 6 months ago

I've added three tables:

I've updated make_beadplots in batch_utils.py to only run the clustering analysis on new/modified lineages. Need to run tests to verify correctness

ArtPoon commented 6 months ago

Close after merging dev to master and running tests