faylward / viralrecall

Detection of NCLDV signatures in 'omic data
30 stars 11 forks source link

HMMs database #20

Open sugar-sugar1 opened 1 year ago

sugar-sugar1 commented 1 year ago

Hello,

Will the databases of HMMs ( cellular organisms and GVOG) be updated? If I want to build my own database, how to configure it to be compatible with viralrecall? How can I skip the prodigal and use the .faa file as the input file directly? (the genome of some cellular organisms are too large)

Thanks for your time and consideration !

faylward commented 1 year ago

There are currently no plans to update the GVOG database - it should still work well for Nucleocytoviricota. There is no straightforward way to change the database because the scores for each GVOG have been calibrated according to their prevalence in Nucleocytoviricota vs other viruses - If you wanted to do that you'd have to change the path to the GVOG database in viralrecall.py and also alter the files in acc/ so that your new HMMs were present. Lastly, in the bin/ folder I have left an executable for prodigal where the source code has been altered to allow for longer contigs - if you put that prodigal in your PATH you should be good.