Closed bradfordcondon closed 5 years ago
As far as I can tell, the GenBank assembly is the same data as the WGS accession, just with different accession numbers. I am not sure why they are separate entities. I suspect that WGS is older, and assembly was added on with some new functionality, based on skimming this article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702866/
So - I agree with you that WGS should be a Dbxref of the GenBank assembly (if possible).
I don't think we can assume that GenBank and RefSeq are identical, so these probably aren't dbxrefs.
I don't think we can assume that GenBank and RefSeq are identical, so these probably aren't dbxrefs.
ok i agree and that makes sense. But that also means I dont know what to do with them, because I dont think I can tell which is referring to the accession being imported and which is a link to another accession.
after some rooting around, we always find that hte uids for the refseq and genbank just redirect you to the parent UID. theres no difference between them. as such, i think all of them DO qualify as xrefs, paradoxically.
these are all now xrefs as of #199 .
we MIGHT shoot ourselves in the foot if the assembly references a different, distinct, assembly. but i havent seen a case where thats actually what is going on.
child of #192
Assembly parser currently has the following in
$info['accessions']['assembly']
:in #192 we had a case where importing GCA_000188095.3 resulted in all 4 of these being imported with different values... but all of them point back to the same UID!
Which are accessions? which are linked records and therefore separate analsyes that need to be created? WGS- i dont think thats an asembly at all as its in nucleotide. we dont have a "dbxref" section of the assembly parser. we should, and thats where it should go.
so for GCA_000188095.3