Open gtsueng opened 4 weeks ago
@DylanWelzel It looks like the parser already has methods for pulling the species/infectiousAgent info and the author info, but they don't appear to be working.
@DylanWelzel from what I've seen on Staging, the standardization and delineation pipelines are working well for SRA; however, after the standardization and delineation, the original term remains in the species
field even if a standardized version has been moved to the infectiousAgent
field. This is makes it look like the term is duplicated for infectiousAgent
and species
.
Part of the problem may stem from the formatting of the ingest to the species
field. It appears that NCBI SRA species info is formatted as "name":
Here are a few examples to facilitate the investigation of the issue:
The NCBI SRA fix is live on the staging api. The links above no longer include the original term in the species field.
Looks good, please push the updates to Production. I am marking this issue as 'pending close out' and will close it in a week if there are no further concerns
Issue Name
Investigate propagation of missing SRA metadata from Experiment-level records
Issue Description
Currently, SRA seems to be parsed from Study-level records. These records are missing key metadata fields which are desirable for inclusion in the NDE, including: species/infectiousAgent information. This information instead can be found in the nested/associated Experiment-level record in SRA.
To do: Determine if there is a way to propagate 'author' and 'species/infectiousAgent' metadata from the Experiment-level record up to the Study-level, as SRA records appear to be missing this information currently
Issue Example
Example record in NDE missing 'species'/'infectiousAgent' and 'author' information: https://data.niaid.nih.gov/resources?id=ncbi_sra_srp253552
Example Study-level record in SRA (also missing the crucial fields): https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP253552
Example Experiment-level record containing the relevant fields: https://www.ncbi.nlm.nih.gov/sra/SRX7964236[accn]
Related WBS task
For internal use only. Assignee, please select the status of this issue
Status Description
No response