Closed sapoudel closed 3 years ago
make_prots function now removes all duplicates and pseudogenes. Still might run into issues with make_prots_db function if user tries to make the db with home-made fasta file with duplicate gene names, but there are lots of tools out there to clean that up and lots of forums that teach you how to do it.
Make sure that make_prot_db works for both the output of make_prots or the CDS files downloaded from NCBI. Right now I get errors when I directly download from NCBI, because their CDS files sometimes contain more than one entry for the same gene (i.e. pseudogenes). I think it would make sense to check within the make_prot_db file for this, and either (a) use the first fragment only, or (b) join all fragments together. Use MG1655 fasta from NCBI as test.