EBISPOT / DUO

Ontology for consent codes and data use requirements
Other
64 stars 15 forks source link

What does -[XX] mean #28

Closed jimmyhli closed 4 years ago

jimmyhli commented 5 years ago

I notice that in some of the fields,

https://github.com/EBISPOT/DUO/blob/master/src/ontology/duo.csv

Its shorthand includes -[XX], what does it mean? Thanks

mcourtot commented 5 years ago

It indicates some additional information to be provided in the shorthand. For example, DUO:0000025 TS-[XX] time limit on use This requirement indicates that use is approved for a specific number of months.

The [XX] is meant to indicate the number of months, e.g., TS-[6]. This originates in the original Dyke et al paper

That being said, I'm not sure this is helpful anymore. Firstly, there was no guideline on how to use them practically (eg for geo location, what should the possible values of XX be?) and even for time the group thought using ISO would be better (see here)Secondly, this is not documented as you pointed out. Thirdly, we are working on a schema representation based on DUO codes to capture those 'modifiers' - see here

I'll flag this with group at our next call, thanks for the feedback!

jimmyhli commented 5 years ago

Thanks for looking into this Melanie.

So I went to read the Dyke et al paper you linked, I guess the -[XX] here does not necessarily mean the number of months. From this table:

image

It looks like, at least in the context of this article, -[XX] only refers to the number of months for [TS] time limit on use, but means geographic region for [GS] geographic restriction, and refers to a specific date for publication moratorium etc. Anyways, these are the only -[XX] I see in the csv file. Unlike the paper, which has an -[XX] that seems to refer to a specific disease..

I would be interested in learning more about how the modifiers are structured. Thanks again for looking into these.

mcourtot commented 5 years ago

Hi @haoyuanli - you're totally right, maybe I wasn't very clear. The [XX] in the shorthand is meant to encode the 'modifier' - time in months for TS, geo location for GS etc. The point I was making above is even though we know that in GS-[XX] is for a geo location, there is no indication of the format of that information. For example, if I want to say 'restricted for use in Texas' - how do I note that? GS-[Texas], GS-[US-TX], something else? In any case for DUO this doesn't matter much, as users would use the DUO code for GS DUO:0000022 when annotating their dataset, and then add the info about the geo location as a modifier. This is the work we are doing under the schemablock ticket I referred to - how do we practically format those. Eg for geo location we will probably defer to using a geo location schema block when this is formalised, which will allow for cross compatibility with other GA4GH products (including researcher IDs and search)

Also for Time, practically a duration in months is hard to maintain for resources - as it implies a 'start date' was also captured (one needs to know 6 months starting from when) - the proposal is therefore to instead indicate the end date directly (and this could be encoded in ISO 8601)

Given those, I am thinking that it is not very helpful to keep the [XX] notation for the shorthand - I would instead expect resources to display the code + modifier based on schemablock format.

For disease, we always intended to use a disease ontology from the start - being explicit about the modifier - so we didn't include it in the shorthand. IMO this adds to the confusion and streghten the position above that we should just remove them all together.

Does that help? Happy to have a chat otherwise, it may be easier to have a quick call than using the GH comments :)

jimmyhli commented 5 years ago

Yeah, that clarifies a lot of things. Thanks a lot Melanie. I don't think I have more questions for now, but I will certainly let you know if I have more questions otherwise, and will be happy to chat more over the phone or any other means you think is more convenient.