JCVenterInstitute / VIGOR4

VIGOR4
GNU General Public License v3.0
17 stars 11 forks source link

A suggestion for a new database of Coronavirus #29

Open 544728460 opened 3 years ago

544728460 commented 3 years ago

Dear developers, Thanks so much for providing VIGOR4 which is a very convenient software for virus annotation, especially for SARS-CoV-2. But can i suggest adding a new databse of coronavirus in view of some unpleasant annotation results for coronavirus which are a little far from SARS-Cov-2 in genetic distance. Or if it possible for us users to erect a local database on our own? Any instructions or script for local database establishment?

pamedeo commented 3 years ago

I believe that older versions of this repository had other Coronavirus databases. Right on the spot, I don't remember the names and, having left JCVI over two years ago, I don't have the old files at hand. But we did convert to VIGOR4 format all the old VIGOR3 databases. I believe that VIGOR_DB repository has at least most instructions on how to build a new VIGOR4 database from scratch. The key is to select the most diverse set of sequences (no use in having more of the same) with reliable annotation.

Paolo

544728460 commented 3 years ago

Thanks for your kindly reply ! But i still have no idea about how to establish a local database. Can u go much further about the details ? What kind of files i should prepare and which parameters in configuration i should change for a new local database ? Actually I do have some CDS protein sequences in fasta format with which i want to used as references for my local database.

And it is not clear for me about the splice_form="e6654" part in DB files can u explain this parameter for me ? Is it a indispensible parameter? How can i get the specific number just as "e6654" ?

Sorry for my too many questions but it do have a lot of help for me to establish the local database for my latest subject ! Thanks too much for your kindly help !

544728460 commented 3 years ago

Dear all, I tried to build a local vigor4 database for coronavirus RdRp protein. But when i ran vigor4 after constructing the simplified local database files :coronavirus_rdrp_db and coronavirus_rdrp_db.ini, it feedback a blank result. Atually it shouldn't be this because the full genome sequence(HCoV-EMC-2012.fasta) i used as a test is download from NCBI and its RdRp protein sequence is included into coronavirus_rdrp_db. The files i used for local RdRp database establishment and the full genome sequence(HCoV-EMC-2012.fasta) i used for test is enclosed. files.zip

Can anyone help me with this problem ? I am waiting desperately for its necessity for my latest subject! Thanks all for this!

pamedeo commented 3 years ago

Herry, Unfortunately, I am unable to find your files. RdRp is a mature peptide: in order for VIGOR to detect it, you need to have the ORF1ab gene. Given that I do not have access to your database, I don't know if this is your issue or not.

Paolo

singhindresh commented 3 years ago

Thanks Paolo.

Harry, I'm getting file not found for files.zip, can you attach the file again so that I can review and help you.

Thanks Indresh

544728460 commented 3 years ago

Sorry for not finding the files. Maybe the network was interruptted last time. I upload the files again. Thanks so much for both of you!

"coronaviridae_db" and "coronaviridae_db.ini" were files i wanted to use to establish a whole coronaviridae database but still failed for proteins like RdRp. files.zip

Can someone help me with this? Thanks to all!