carolzhou / multiPhATE

Throughput PhATE processing of draft or finished phage genomes
14 stars 7 forks source link

need to modify getDBs.py ? #19

Closed carolzhou closed 3 years ago

carolzhou commented 4 years ago

Check performance of getDBs.py due to possible changes in NCBI's files on their ftp server.

nikolasbasler commented 4 years ago

Hello. Not sure if I should add this here but I noticed something else with the getDBs.py: In the documentation you state that the files containing the virus genomes should be concatenated into one and then made into a blast database. However, getDBs.py creates separate databases for both files (this is also the case for the proteins). Also, one of the calls that should create a protein database attempts to make a nucl blast database, which leads to thousands of error messages. I attached a screenshot where all of the above is visible.

multiphate
carolzhou commented 4 years ago

The database downloader script has been updated, now called: dbPrep_getDBs.py. I'm still having difficulty getting blastp in Conda to recognize the Swissprot and Refseq Protein databases, although the hmm search codes (eg, phmmer, jackhmmer) recognize those databases just fine. I will post further updates as I get these glitches worked out.

carolzhou commented 3 years ago

The dbPrep_getDBs.py script included in the multiPhATE2 distribution has been updated and tested and is working in my hands as of this writing (26 Jan2021). One is advised to switch to multiPhATE2 at this time, as multiPhATE will soon be deprecated.