biocom-uib / vpf-tools

Virus Protein Family tools
BSD 3-Clause "New" or "Revised" License
27 stars 7 forks source link

Classification file names and failed runs #12

Closed LSHillary closed 2 years ago

LSHillary commented 3 years ago

The names of the classification files have changed from how they are described in index.yaml. Easily fixed by the user but it would be good if the code could be updated, or that this issue could be highlighted in the README so that it doesn't cause runs to fail after a few hours. Also, it would be great if vpf-class could build in some break-points so that you don't lose progress if the run fails between stages.

bielr commented 3 years ago

Just to be sure, are you referring to the index.yaml file provided in the tarball together with its companion files? Could you provide more details (file names and links)

That break-point functionality is already implemented, but it requires the --work-dir parameter (otherwise the tool used a temporary directory that is deleted between runs).

LSHillary commented 3 years ago

That's correct. In the index files, the files are described as e.g. ./vpf_classification/2019_VPF_GenHost_REclassification.txt whilst in the folder, you get 2019_VPF_GenHost_REclassification_new.tsv. The uvig scores have also changed their file extensions from .txt to .tsv. Also, it looks like host phylum classifiers has been replaced by host domain.

bielr commented 3 years ago

Fixed the issue with the tarball, re-downloading it should fix it.

Thanks for the heads up!