cmmr / virmap

GNU Affero General Public License v3.0
25 stars 14 forks source link

how to set $taxaJson parameter #4

Open kyle-lk opened 5 years ago

kyle-lk commented 5 years ago

Hi Matt, I need a help to comprehend your pipeline. I don't know how to set parameter $taxaJson ,Can you give me a example?

Thanks

torptube commented 5 years ago

Hi Kyle,

The --tasaJson parameter is a bit of a misnomer. The serialized data-structure is no longer JSON. I think in this version it's using Sereal. The structure is just a hash of hashes of the NCBI taxa-dump. The top level keys are "parents", "names", "ranks", and "children". These correspond to the same columns in the NCBI taxa dump.

Hopefully this helps, I don't have any place to upload the version that I am using internally, since it's a rather large file, and I am not done writing the database construction instructions.

Cheers, Matt

kyle-lk commented 5 years ago

Thanks,Matt,

So, do I need to write a script to convert NCBI's faxomomy dump files? Or I just added "parents", "names", "ranks" and "children" directly to the first line of the gi_taxid_prot.dmp file. In fact, I just came into contact with bioinformatics, this is the second pipeline I studied. A lot of things still don't understand.

torptube commented 4 years ago

A database constructor has been written and posted.