broadinstitute / catch

A package for designing compact and comprehensive capture probe sets.
MIT License
76 stars 16 forks source link

Incorrect fasta paths for some genomes in directories #17

Closed yesimon closed 6 years ago

yesimon commented 6 years ago

These human host genomes have virus.fasta as their fasta path but their actual path is virus/[0-9a-z]+.fasta

alethinophid_reptarenavirus
amapari
ambe
bear_canyon
bwamba
cao_bang
caraparu
cupixi
hughes
human_picobirnavirus
keterah
lujo
marituba
oliveros
oriboca
parana
toros
yogue
zerdali
haydenm commented 6 years ago

Thanks for catching this! This happened because when writing the data .fasta files I determine if a genome is segmented based on whether the sequences have a segment name, but when writing the .py files I determine if it's segmented based on whether the number of segment names is >1. Genomes in these datasets have sequence(s) with a segment name but only one segment (e.g., all sequence(s) are labeled "segment X"); therefore, I was treating them as segmented when writing the .fasta files but not when writing the .py files. I fixed the script that generates these to be consistent in deciding whether a genome is segmented, and reran it for these 19 datasets.