Closed mpoelchau closed 5 years ago
short answer: no you've just got a knack for finding edge cases.
So i dont know what to do in these cases where accessions returned are non-unique, provided we cant find a way to make them unique.
412476: we expect 3 biosamples, 1 organism, 1 publication from the preview.
upon running the importer:
Inserting record into Chado: bioproject: 412476
[site http://default] [TRIPAL ERROR] [TRIPAL_EUTILS] Unable to find UID for biosample:SAMD00093090
So, it failed loading the first biosample because the accession was non-unique.
with project 167477, we expect 2 biosamples, assembly, an organism, and a pub
Calling: tripal_eutils_create_records(bioproject, 167477, 1)
INFO (TRIPAL_EUTILS): Inserting record into Chado: bioproject: 167477
INFO (TRIPAL_EUTILS): Inserting record into Chado: biosample: 2434893
INFO (TRIPAL_EUTILS): Inserting record into Chado: biosample: 2649412
[site http://default] [TRIPAL ERROR] [TRIPAL_EUTILS] Unable to find UID for assembly:GCA_000648675
so in both cases, we get an error because we cant identify that accession uniquely.
if we search for the assembly:
https://www.ncbi.nlm.nih.gov/assembly/?term=GCA_000648675
in this cas,e there are "anomalous results". presumably we can add filter
parameters to the query and we'd get a single result.
for the biosample:
https://www.ncbi.nlm.nih.gov/biosample/?term=SAMD00093090
interesitly, only 1 result here via the GUI. how about the API?
GET /entrez/eutils/esearch.fcgi/?db=biosample&retmode=xml&term=SAMD00093090 HTTP/1.0
via the API we get two results:
<Id>7714098</Id>
<Id>7714100</Id>
so two samples: SAMD00093521 and SAMD00093090. Cool, 93521 is a pool of 93090 (female) and another sample, the male sample. However, NEITHER XML FILE includes this relationship in a machine readable way. 93521 describes it in the text only. 93090 doesnt even include it in the text! How does the server even know to return 93521 if i search with 93090? It must have hte information stored somewhere!
ok, NCBI does indeed let us specify the field, just not on a per parameter basis. Just what we need.
`$provider->addParam('field', 'accession');` .
So SAMD00093090
now returns a single result.
As for the multiple assemblies, we need to add filters in a similar manner.
I imported bioproject 412476, and selected "create linked records". I then successfully published the Project record. The Biological Sample record, however, was not imported. Publications and organism were also not imported.
I tried again, with bioproject 167477. This time the one of 2 biosamples also imported, but not much metadata was imported - just the accession number as the name. One empty publication record was imported. No organism.
Am I doing things in the wrong order?