broadinstitute / catch

A package for designing compact and comprehensive capture probe sets.
MIT License
76 stars 16 forks source link

Remove viral datasets distributed with CATCH #52

Closed haydenm closed 1 year ago

haydenm commented 1 year ago

CATCH previously included data and corresponding Python modules for accessing NCBI genomes of all human-associated viruses. They were under catch/datasets/. This had some advantages, including (i) making it easier to work with the data for one-off analyses and (ii) storing segmented viral genomes and treating those correctly as genomes.

However, the data was difficult to keep up-to-date, mostly because of a need to manually curate the list of viruses and handle miscellaneous issues. (The Python modules were all computer-generated from the data.) The data has not been updated since October, 2018. The download:TAXID input format introduced into CATCH since then—which automatically downloads the most recent viral sequences from NCBI—partially reduces the benefit of having viral sequence datasets distributed with CATCH. As a result, this PR removes those datasets.

In particular, the PR: