Closed bradfordcondon closed 5 years ago
ACCESSION TYPE | VALUE | CREATED |
---|---|---|
Assembly | 6127518 | YES . this is the refseq UID |
Assembly | 6049248 | YES . this is the genbank UID |
Assembly | GCF_000188095.2 | YES this is the refseq accession |
Assembly | AEQM02 | this is the wGS accession. I think maybe it should be in its own section (ie just a dbxref, not a linked record) |
Organism | 132113 | YES but its silent. Let's log |
Bioprojects | 61101 | YES |
Bioprojects | 70395 | YES |
Biosamples | 2953787 | NO but should be |
Calling: tripal_eutils_create_records(assembly, GCA_000188095.3, 1)
INFO (TRIPAL_EUTILS): Inserting record into Chado: assembly: 1571891
INFO (TRIPAL_EUTILS): Inserting record into Chado: bioproject: 61101
INFO (TRIPAL_EUTILS): Inserting record into Chado: pubmed: 25908251
INFO (TRIPAL_EUTILS): Inserting record into Chado: pubmed: 9023104
INFO (TRIPAL_EUTILS): Inserting record into Chado: bioproject: 70395
INFO (TRIPAL_EUTILS): Inserting record into Chado: pubmed: 21482769
4 assemblies vs 2 assemblies: GCF_000188095.2 IS 6049248. AEQM02 IS 6127518. Therefore, this is a formatter fix.
uids for assemblies: 1571891, 6048924, 6127518. Then if you follow the GUI linkout, each of those goes to GCF_000188095.2 !!!!!!! The actual WGS record is https://www.ncbi.nlm.nih.gov/nuccore/AEQM00000000.2/ . which.... well, i dont know how we're supposed to get that from AEQM02 anyway. https://www.ncbi.nlm.nih.gov/nuccore?term=AEQM02 gives us 1038 results.
Probably related to the biosample being linked via project.
no problems.
i'm going to make a child issue of this for assembly. Basically its misleading that these are listed as additional linked records. one is a WGS xref which goes to nucleotide- i think we want that to go in analysis_dbref. One is the input accession. The remaining keys are refseqUID, genbankUID, and refseq accession. These are kind of all the same analysis... so should be figured out in a separate issue since its quite complciated and i want to restructure the assembly xml parser to be easier to work with for this.
thanks for testing @mpoelchau . you should find the problem with teh biomaterial not being created resolved. The issue with the analyses records is a confusing bag of stuff so i made #194 to figure it out
Confused because your log message above states that publication records are being imported, but they're not in the ncbi xml afaik and the preview display doesn't list them... That said PMID 25908251 is a bumblebee paper. those pubs don't show up in the tripal content on the droplet if I publish publications.
or wait is that because we decided not to import pubs? sorry, need to look back at our comment history.
We did decide to import pubs. https://github.com/NAL-i5K/tripal_eutils/issues/141
You are right that they arent in the XML for the assembly. They get imported and linked in the Project.
Is the expected behavior that pubs associated wit hthe project wouldnt import because you are importing via the analysis? My thinking was that secondary records like pubs are still created but not linked primary records.
organisms and pubs get imported even when in a secondary record because these are kind of just "decorators". at least that was my thinking. do you not want it to work this way?
I think it's fine to import them, but the admin user needs to know that they're being imported. You can't tell from the preview. Not sure if it's sufficient to just display it in the log message - even if it's documented that to get the full picture, you need to be viewing the drush log, an admin user could still be unwittingly importing pubs without realizing it if they choose not to read the documentation (guilty as charged).
this is an assembly