SBRG / pymodulon

Python package for analyzing and visualizing iModulons
MIT License
11 stars 13 forks source link

Make_prots_db doesn't work with NCBI files #67

Closed sapoudel closed 3 years ago

sapoudel commented 3 years ago

Make sure that make_prot_db works for both the output of make_prots or the CDS files downloaded from NCBI. Right now I get errors when I directly download from NCBI, because their CDS files sometimes contain more than one entry for the same gene (i.e. pseudogenes). I think it would make sense to check within the make_prot_db file for this, and either (a) use the first fragment only, or (b) join all fragments together. Use MG1655 fasta from NCBI as test.

sapoudel commented 3 years ago

make_prots function now removes all duplicates and pseudogenes. Still might run into issues with make_prots_db function if user tries to make the db with home-made fasta file with duplicate gene names, but there are lots of tools out there to clean that up and lots of forums that teach you how to do it.