BioSchemas / specifications

Issue tracker, technical wiki, and example markup
https://bioschemas.org
54 stars 52 forks source link

Protein Entity ShEx #203

Closed AlasdairGray closed 4 years ago

AlasdairGray commented 6 years ago

@ljgarcia I've just been reviewing the ShEx you have provided for the Protein profile.

I have made a change on the shex branch that will ensure that validation performed against the Recommended and Optional shapes will enforce the constraints of the lower marginality levels, i.e. the Recommended Shape also enforces the Minimum Shape constraints. I have also added a comment at the start of the file to state that it is enforcing the constraints of version 0.4.

I have some questions about the choice of marginalities in the shapes compared to the profile, in particular why you let some be 0 or more rather than 1 or more. With 0 or more the constraint becomes optional, but if we are saying the property is required for the marginality level, then surely we need to use 1 or more. An example of this is image.

The other related question is the decision to use one or more for name (and other similar properties where the marginality is one), when the declared marginality in the profile is one.

ljgarcia commented 6 years ago

Hi @AlasdairGray

Thanks for updating the shape.

About image * when checking recommended, yes, we can change it to +. We cannot do the same for associatedWith as it is recommended as far as the protein it is indeed associated to a diseaase but that is not the case for every protein. Some proteins cannot comply to it.

Regarding identifier, name, description and url, it should be one... maybe I just did not find the way to state it in ShEX when I was playing with. I do not remember now. Please feel free to fix it in the branch you created for it.

ljgarcia commented 6 years ago

Oh, with isTranslatedFrom might happen the same thing as with isAssociatedWith, when, e.g., it comes from metagenomics samples, and some other few cases. Recommended here means "if you have it, do provide it, is important!". Does it make sense?

AlasdairGray commented 6 years ago

The issue here is how is someone interpreting the specification meant to know which properties in the recommended section are required, so as to be compliant with the recommended part, and which are on a best effort basis. At the moment both these situations are captured with MANY.

ljgarcia commented 6 years ago

I am not sure I follow it. What would you suggest to make it easier for adoption?

AlasdairGray commented 6 years ago

We will need to update the profiles so that we can distinguish the cases. This will mean changing the use of ONE and MANY to using terms such as:

AlasdairGray commented 6 years ago

I've created an updated version of the Protein ShEx on the branch shex-protein.

This not only fixes the cardinality issues but also several other issues that I spotted in the ShEx. For example, to enforce that something is of a specific type, say ProteinAnnotation, you need to create a shape for it. Also some terms were wrong or included the wrong types.

I've tried to verify the Protein example against this, but there are now so many alternatives that the validator is whirring away without coming to a conclusion (at least I think that is what is happening).

AlasdairGray commented 6 years ago

I've tried to verify the Protein example against this, but there are now so many alternatives that the validator is whirring away without coming to a conclusion (at least I think that is what is happening)

Turned out that I had given it the wrong resource to validate against.

ljgarcia commented 6 years ago

We will need to update the profiles so that we can distinguish the cases. This will mean changing the use of ONE and MANY to using terms such as:

ONE ZERO or ONE ONE or MORE ZERO or MORE

Zero or one, is not that optional with cardinality ONE? and zero or more would be optional with cardinality MANY, would not it? One or more would be then minimum with cardinality MANY.

If you consider some of the cardinalities for the Protein spec should be changed, maybe you could use a PR. That would be easier than changing cardinalities at this point, would not it?

AlasdairGray commented 6 years ago

Zero or one, is not that optional with cardinality ONE? and zero or more would be optional with cardinality MANY, would not it? One or more would be then minimum with cardinality MANY.

From what you said previously it sounded like some of the recommended properties were somehow optional if you were validating at the recommended level. Using the zero forms of the cardinality would allow this to be captured.

ljgarcia commented 6 years ago

Ok, but, would it be reflected only on the shapes or also on the spec? I am not sure how easy people will follow the use of a recommended property with cardinality zero or more. I do not know what the best solution here is so I will go with what the community decides. Even if it means moving those recommended but not enforced properties to optional.

AlasdairGray commented 6 years ago

If we are to allow a recommended property to not be present, we would need to reflect this in the spec, otherwise those writing the shape and more importantly those writing descriptions against the spec will not know that they are recommended but not required.

I would probably prefer the zero or more options in the spec as that will truely say that if this property is relevant to your resource then you are recommended to provide it.

ljgarcia commented 6 years ago

If that is the way to go, then there are two options. If you want to, you can go ahead and make those changes on the spec and name me as a reviewer. Or you can let me know whenever the environment is ready for those new cardinalities (I am thinking on the spreadsheets and the goweb process). I would suggest in any case to announce it to the community. Does it make sense to you?

ljgarcia commented 5 years ago

@AlasdairGray after reading the discussion, this is on your court right now.