cidgoh / nf-ncov-voc

A Nextflow wrapped workflow for generating the mutation profiles of SARS-CoV-2 genomes (Variants of Concern and Variants of Interest). Workflow is developed in collaboration with COVID-MVP (https://github.com/cidgoh/COVID-MVP) which can be used to visualize the mutation profiles and functional annotations.
MIT License
5 stars 5 forks source link

split update_index_and_logfile.py into 3 scripts, and use Dask #155

Closed miseminger closed 5 months ago

miseminger commented 6 months ago

new scripts are:

the old script, update_index_and_logfile.py, is obsolete after the addition of these three scripts, and can be deleted.

also important to note: the mutation index columns have changed since we last made an index. The new columns are: ['pos', 'mutation', 'hgvs_aa_mutation', 'hgvs_nt_mutation', 'gene', 'protein_name', 'alias', 'hgvs_alias', 'alias_protein', 'Pokay_annotation', 'lineages']

miseminger commented 6 months ago

@miseminger tested with Dask version 2021.10.0. Will update to the current version and check that it still works.

miseminger commented 6 months ago

Please replace the minimal use of pandas with dask in bin/merge_indices.py and bin/merge_logfiles.py scripts and test it on the latest dask version 2023.10.1-py11-ol9_cv1

Latest Dask version (2024.2.1) depends on Pandas 2.2.1; can we update Pandas in environment.yml?