CATCH previously included data and corresponding Python modules for accessing NCBI genomes of all human-associated viruses. They were under catch/datasets/. This had some advantages, including (i) making it easier to work with the data for one-off analyses and (ii) storing segmented viral genomes and treating those correctly as genomes.
However, the data was difficult to keep up-to-date, mostly because of a need to manually curate the list of viruses and handle miscellaneous issues. (The Python modules were all computer-generated from the data.) The data has not been updated since October, 2018. The download:TAXID input format introduced into CATCH since then—which automatically downloads the most recent viral sequences from NCBI—partially reduces the benefit of having viral sequence datasets distributed with CATCH. As a result, this PR removes those datasets.
In particular, the PR:
Deletes the data and corresponding Python modules [04385ff], and deprecates the function that read from them [e96580e]
Updates unit tests to reflect the deletion [51645e3]
Updates the design scripts and README to no longer allow the datasets as input [170196a, ee1bd48]
CATCH previously included data and corresponding Python modules for accessing NCBI genomes of all human-associated viruses. They were under
catch/datasets/
. This had some advantages, including (i) making it easier to work with the data for one-off analyses and (ii) storing segmented viral genomes and treating those correctly as genomes.However, the data was difficult to keep up-to-date, mostly because of a need to manually curate the list of viruses and handle miscellaneous issues. (The Python modules were all computer-generated from the data.) The data has not been updated since October, 2018. The
download:TAXID
input format introduced into CATCH since then—which automatically downloads the most recent viral sequences from NCBI—partially reduces the benefit of having viral sequence datasets distributed with CATCH. As a result, this PR removes those datasets.In particular, the PR: