GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
38 stars 21 forks source link

HCR terms are duplicated but have 'hcr' in the slot name #664

Open mslarae13 opened 11 months ago

mslarae13 commented 11 months ago

Describe the bug A clear and concise description of what the bug is.

Extensions should NOT have their own slots / terms for every metadata field. For example, when measuring temperature of the sample, you should use the term 'temp' .. not make an extension specific term like hcr_term .

And with the expansion if LinkML IF there is need to say "This is specific for this extension in this way" you use slot usage.

Expected behavior A clear and concise description of what you expected to happen.

Slot usage should be implemented and slots should be evaluated if they're repeated.

mslarae13 commented 11 months ago

not sure if this is a bug, but seemed right? Cuz it's not 1 slot specific.. I just referenced 1 slot.

turbomam commented 11 months ago

@mslarae13 I think you meant _temp, not _term in your initial comment

not make an extension specific term like hcr_temp .

cmungall commented 11 months ago

I agree in principle across all terms, but it needs to be clear whether a property refers to a sample or some aspect of the environment in which the sample was collected. These may not be the same.

However, this should not be done by prefixing with the extension or something specific like hcr, rather it should be consistent use of prefixes like sample_ and environmental_

mslarae13 commented 11 months ago

Other slots to review for redundancy

samp_transport_cond vs samp_transport_temp


only1chunts commented 11 months ago

Other slots to review for redundancy

samp_transport_cond vs samp_transport_temp

Can we keep one ticket to one bug/suggested update, I think this ticket started out as remove duplicate 'HCR_temp' and replace with 'temp' in relevant extensions. I believe thats a good call and should be done. I've added the CIG review label

turbomam commented 11 months ago

we can always make grouping issues, where the first comment contains a checklist like

Then each of those items will have a bulls-eye like button to the right that converts it into an individual issue.

implementation:

- [ ] wake up
- [ ] drink favorite morning beverage
- [ ] solve world's problems
lschriml commented 11 months ago

Regarding the hcr_temp. The definition is: Original temperature of the hydrocarbon resource.

Temperature: Temperature of the sample at the time of sampling

These should not be combined.

On Thu, Oct 19, 2023 at 12:30 PM Mark A. Miller @.***> wrote:

we can always make grouping issues, where the first comment contains a checklist like

  • wake up
  • drink favorite morning beverage
  • solve world's problems

Then each of those items will have a bulls-eye like button to the right that converts it into an individual issue. implementation:

  • [ ] wake up
  • [ ] drink favorite morning beverage
  • [ ] solve world's problems

— Reply to this email directly, view it on GitHub https://github.com/GenomicsStandardsConsortium/mixs/issues/664#issuecomment-1771336453, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBB4DLRBKOFFTNPEOE3MKLYAFITZAVCNFSM6AAAAAA6F2IEBCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZRGMZTMNBVGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Lynn M. Schriml, Ph.D. Associate Professor

Institute for Genome Sciences University of Maryland School of Medicine Department of Epidemiology and Public Health 670 W. Baltimore St., HSFIII, Room 3061 Baltimore, MD 21201 P: 410-706-6776 | F: 410-706-6756 @.***

turbomam commented 11 months ago

@lschriml thanks for the feedback on this particular issue. Can you please help address the more general issue raised by @cmungall above?

turbomam commented 11 months ago

In my humble opinion, worrying about backwards compatibility withhcr_temp may not be justified. It does not appear anywhere in LBL's July 2023 SQL dump of NCBI Biosmaple, as an attribute_name or a harmonized_name.

I think this is an equivalent search though the NCBI Biosample web interface, but I am not an expert on that: https://www.ncbi.nlm.nih.gov/biosample/?term=hcr_temp%5BAttribute%5D

Are there other databases that I should be looking in?

lschriml commented 11 months ago

To date, we have not included this type of prefix. Perhaps this could be noted in the sections ?? Sent from my iPhoneOn Oct 18, 2023, at 9:14 PM, Chris Mungall @.***> wrote: I agree in principle across all terms, but it needs to be clear whether a property refers to a sample or some aspect of the environment in which the sample was collected. These may not be the same. However, this should not be done by prefixing with the extension or something specific like hcr, rather it should be consistent use of prefixes like sample and environmental

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

lschriml commented 11 months ago

This term is for hydrocarbon sampling. Being used by the petroleum industry. May not be in public databases.Sent from my iPhoneOn Oct 19, 2023, at 1:21 PM, Mark A. Miller @.***> wrote: In my humble opinion, worrying about backwards compatibility withhcr_temp may not be justified. It does not appear anywhere in LBL's July 2023 SQL dump of NCBI Biosmaple, as an attribute_name or a harmonized_name. I think this is an equivalent search though the NCBI Biosample web interface, but I am not an expert on that: https://www.ncbi.nlm.nih.gov/biosample/?term=hcr_temp%5BAttribute%5D Are there other databases that I should be looking in?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

turbomam commented 11 months ago

@mslarae13 I forgot to say that I've created some tools that might simplify researching an issue like this. I haven't advertised them much because they're not totally ready for prime time.

  1. download mixs_derived_class_term_schemasheet.tsv
  2. open in a spreadsheet editor
  3. remove schema-sheets specific rows 2-4
  4. turn on your spreadsheet's auto-filtering mode
  5. filter the keywords column (S) on the text "temperature"
  6. you will get a report that should theoretically include all slots whose titles include the word temperature, or something synonymous

One part of "not completely ready for prime time" is that I added the keywords with a text-mining/human curation approach. I have suggested that we should have a discussion about ongoing keyword maintenance.

turbomam commented 11 months ago

I think most or all of those slots should have a see_also to temp and a comment that explains how they are different from temp

And I think we should follow that pattern for all other clusters of similar slots.

turbomam commented 11 months ago

And I think they should all follow the same validation pattern in the absence of some traceable reason.