NIAID-Data-Ecosystem / nde-crawlers

Harvesting infrastructure to collect and standardize dataset and computational tool metadata
Apache License 2.0
0 stars 0 forks source link

[Parser Fix]: DDE DefinedTerms `curatedBy` fix #124

Closed gtsueng closed 4 months ago

gtsueng commented 4 months ago

Issue Name

DDE DefinedTerms curatedBy fix

Issue Description

The records ingested via the DDE are generally manually curated, so it is strange that fields where the expected type is DefinedTerm is being treated as if they were augmented. We should only consider them augmented if the value for the field is a non-URI text.

Suggested fix- Have a separate handler for URIs

Fields that are potentially affected (i.e. - could potentially have URI values for conversion to DefinedTerm object):

Issue Example

TB Portals is a resource catalog that was added via the DDE: image https://discovery.biothings.io/dataset/9142024b72770a67 As seen in this image, the fields infectiousAgent, healthCondition, and species already have ontology uri values so we should not consider them augmented fields, and they should NOT be considered curatedBy PubTator, or BioThings. They were curated to begin with.

However, they currently display in the ecosystem (staging for this example, but most records from NIAID SysBio/DDE in production have the same issue) as being curatedBy PubTator or BioThings. Here is the same record as it appears in data-staging.niaid.nih.gov: image https://data-staging.niaid.nih.gov/resources?id=DDE_9142024b72770a67

Related WBS task

For internal use only. Assignee, please select the status of this issue

Status Description

No response

jal347 commented 4 months ago

I made a new release on staging. It should be fixed.

gtsueng commented 4 months ago

variableMeasured field appears to still be just a link of urls, are we not able to pull in the name of term?

gtsueng commented 4 months ago

Per discussion on 2024.02.21, PubTator is still greedily assuming credit for these fields for links ingested via the DDE. This behavior needs to be changed.

gtsueng commented 4 months ago

Awesome! It looks like variableMeasured is pulling and displaying terms now! Can we do this for keywords? See https://data-staging.niaid.nih.gov/resources?id=DDE_9142024b72770a67

jal347 commented 4 months ago

Thanks for catching that. Ill get that done today.

gtsueng commented 3 months ago

The improvements are on available on staging