Open kyle-lk opened 5 years ago
Hi Kyle,
The --tasaJson parameter is a bit of a misnomer. The serialized data-structure is no longer JSON. I think in this version it's using Sereal. The structure is just a hash of hashes of the NCBI taxa-dump. The top level keys are "parents", "names", "ranks", and "children". These correspond to the same columns in the NCBI taxa dump.
Hopefully this helps, I don't have any place to upload the version that I am using internally, since it's a rather large file, and I am not done writing the database construction instructions.
Cheers, Matt
Thanks,Matt,
So, do I need to write a script to convert NCBI's faxomomy dump files? Or I just added "parents", "names", "ranks" and "children" directly to the first line of the gi_taxid_prot.dmp file. In fact, I just came into contact with bioinformatics, this is the second pipeline I studied. A lot of things still don't understand.
A database constructor has been written and posted.
Hi Matt, I need a help to comprehend your pipeline. I don't know how to set parameter $taxaJson ,Can you give me a example?
Thanks