Open ptsefton opened 1 year ago
Bioschemas has Specifications (classes) which are Profiles (and have additional property-related constraints) and Types (which do not). Hence to resolve to the appropriate specification on the Bioschemas website, you would include that as part of the base url (eg- 'https://bioschemas.org/types/' or 'https://bioschemas.org/profiles/') to resolve to the desired specification. To my knowledge, the bioschemas site does not resolve properties, which made it tricky to generate the JSON file you referenced.
The schema (json) files for Bioschemas were generated using (and registered to) the Data Discovery Engine (DDE) hence the urls that pointing there should resolve regardless of whether it's a property or a class. That said, there are limitations with the tool in that multiple classes are not allowed in the same name space. Hence, only the latest version of a draft and latest version of a release is available from the DDE site. This is also why there are multiple namespaces, resulting in multiple urls on the DDE (e.g.- /bioschemastypes/
for released Types, /bioschemas/
for released Profiles and another set for drafts).
Thanks for getting back to me @gtsueng
For our current purposes I am not concerned with how to identify profiles, just Types (AKA Classes) and Properties.
My understanding is that Types in schema.org are essentially the same as Classes - the Schema.org schema defines the schema.org Types using rdfs:Class
definitions, while Properties are called the same thing on both the Schema.org websites and in the schema definition (rdf:Property). In order for bioschemas to work in a linked data context and for people to use these schemas in the way they use Schema.org it has to be clear what the URL is that identifies each class and property. Ideally (IMO) these URLs should resolve to something useful, as they do in Schema.org but not all linked-data communities care about that -- some are happy to have URLs resolve to RDF documents or technical stuff.
You are correct that it makes it tricky to generate JSON-LD with these schemas as it is not clear what implementers are supposed to do to generate bioschemas documents but it is not possible to implement systems that create bioschemas markup with knowing the answer. How are your Types (Classes) and Properties identified using URLs? Is it as per the schema which resolves to the DDE documentation or as per the bioschemas website which has Type (Class) documentation?
Since all the projects I work on also use the DDE, I usually resolve any classes or properties for those projects using the DDE.
For interop it's important that everyone does it the same way - otherwise there is no way to tell that two documents are using the same terms, this is fundamental to linked-data and the schema.org approach on which bioschemas is based.
So far, I've just used https://bioschemas.org/$type
, knowing that these URIs don't resolve to machine-readable data (RDF or JSON-LD) and assuming they eventually will go under the schema.org namespace.
This ticket tells me Bioschema should decide some policy about URIs and take corresponding actions, such as redirecting canonical URIs to the DDE.
I'm in favour of a simple canonical namespace like bioschemas.org and without paths for types/profiles/drafts/properties/etc, cause the latter is more complicated to manage (you need to declare multiple namespaces everytime and remember which one you need every time).
Applications can know what a type exactly is (including if it's stable or draft) by resolving its URI, if one needs to refer a given version of a type, we might have a canonical URI that always point to the last (stable?) version, plus versioned URIs, eg, bioschemas.org/ComputationalWorkflow
-> bioschemas.org/ComputationalWorkflow/2.0
and bioschemas.org/ComputationalWorkflow/1.0
exists too.
Obviously, I'm not saying anything new, similar policies have been applied for years in ontology and linked data projects.
The use of https://discovery.biothings.io/ns/bioschemas/ as a temporary namespace seems to be a new thing due to how the DDE editor work, and should not be how Bioschemas' profiles are published. For one thing, this namespace is not in control by Bioschemas community, but by biothings.io. Secondly if a PID is to be established it should be by redirection from a PURL service, not directly leading into the UI of however service works today.
For compatibility with schema.org I would also have expected https://bioschemas.org/input etc. for the properties, but in reality these property links don't work, only for the types.
Some of the types HTML (but not ComputationalWorkflow
) do have their id=property
HTML tags, so for instance https://bioschemas.org/BioSample#custodian works as you would expect, going to the right row in the table.
Types and properties should not be versioned in their PID, because a 1.0 ComputationalWorkflow is semantically also a 2.0 ComputationalWorkflow - however a profile from conformsTo
would show the version of conformance.
Some properties are shared in multiple types, for instance BioSample
extends BioChemEntity
, which before it was merged into schema.org proper would have had properties like https://bioschemas.org/BioChemEntity#associatedDisease (now http://schema.org/associatedDisease) -- but few would probably set up their @context
correctly as there is no common JSON-LD ocntext for Bioschemas so anyone using the non-settled types will invariably do it in many different ways as it's not yet documented which URIs they have.
For compatibility with schema.org I would also have expected https://bioschemas.org/input etc. for the properties, but in reality these property links don't work, only for the types.
I can't see the problem: if Bioschemas needs to add a new property, it can adopt the https://bioschemas.org/<propertyName>
pattern, like the classes, and things can be set up so that data about the property or HTML about it is returned (as usually, via content negotiation).
Types and properties should not be versioned in their PID, because a 1.0 ComputationalWorkflow is semantically also a 2.0 ComputationalWorkflow - however a profile from conformsTo would show the version of conformance.
I'm not sure what PID is. Apart from that, ComputationalWorkflow v2 might not be completely semantically equivalent to ComputationalWorkflow v1 at a formal specification level (not even considering subsumption), for you might have inconsistent specifications or one more general than the other, roughly, as it happens for a class or function name in a Java or Python library. Certainly, we should have a short name like ComputationalWorkflow
, which shouldn't include the version, and also bioschemas.org/2.0/ComputationalWorkflow/
is better than bioschemas.org/ComputationalWorkflow/2.0
, contrary to what I initially wrote.
So is there someone at Bioschemas who can make a determination on this?
I would also wish for https://bioschemas.org/{propertyName}
to work, but the current structure do not have a page per property like at schema.org, so it would have to redirect to https://schema.org/TypeThatFirstIntroducedIt#{propertyName}
or we make such pages.
I would also wish for
https://bioschemas.org/{propertyName}
to work, but the current structure do not have a page per property like at schema.org, so it would have to redirect tohttps://schema.org/TypeThatFirstIntroducedIt#{propertyName}
or we make such pages.
'first type that introduced it' might not be ambiguous (it could be based on the creation date), but defining a property description to feed a URL is a cleaner path and such a description could be more informative than landing on some usage example.
By the way, how many new bioschemas properties do we have? I don't remember very many.
Hi all, let's take the example of https://bioschemas.org/input
property. This was introduced because it's not (yet) part of the Schema.org spec. The issue is that it cannot be de-referenced.
I would go for creating a page in the Bioschemas for each of these "dead" links. I think it would be less confusing than introducing another namespace.
Would that be ok ?
@albangaignard as an outsider that makes perfect sense to me -- we could then add these terms to our RO-Crate context and change them if/when the terms are added to schema.org. It would also be helpful if the documentation made it clear how to refer to the terms, as it is clear that some bioschemas community members are using different URIs for the terms including in the which defeats the purpose of using Linked Data.
BTW, also as an outsider though the properties input
and output
in particular ring alarm bells for me; these are semantically the same as or very close to, object
and result
on http://schema.org:CreateAction, as used here: https://www.researchobject.org/workflow-run-crate/profiles/process_run_crate. Not my project, but do you really need new terms?
@ptsefton I thought (my recollection, could be wrong) I had replied via the Slack integration but my reply is not here at all, so adding it now. Thanks for bringing this up. We are aware of the namespace issues for new types and properties. We are in the process of getting help to move this forward. I will share the strategy once defined so we can also get feedback.
As for the input/output, I am moving your comment to a new discussion, dedicated to those two properties.
@albangaignard as an outsider that makes perfect sense to me -- we could then add these terms to our RO-Crate context and change them if/when the terms are added to schema.org. It would also be helpful if the documentation made it clear how to refer to the terms, as it is clear that some bioschemas community members are using different URIs for the terms including in the which defeats the purpose of using Linked Data.
BTW, also as an outsider though the properties
input
andoutput
in particular ring alarm bells for me; these are semantically the same as or very close to,object
andresult
on http://schema.org:CreateAction, as used here: https://www.researchobject.org/workflow-run-crate/profiles/process_run_crate. Not my project, but do you really need new terms?
@ptsefton New discussion for input/output vs object/result at https://github.com/BioSchemas/specifications/issues/655
Trying to re-awake to close this - are we agreed that the new https://bioschemas.org/properties/input etc. solves this already from a namespace perspective? Then we can implement https://github.com/ResearchObject/ro-crate/issues/300 using that in the context.
Like @stain I'd like to get a resolution for this -- anything happening?
In the bioschemas schema the @context has:
And Classes and properties are defined below. Eg:
Which would mean that the URL to use for
ComputationalWorkflow
should be https://discovery.biothings.io/view/bioschemas/ComputationalWorkflow, right? This does in fact resolve to some documentation as does the property https://discovery.biothings.io/view/bioschemas/input albeit not with a very good description of input.BUT this URL also resolves to some documentation: https://bioschemas.org/ComputationalWorkflow which says that the canonical URL for
ComputationalWorkflow
is https://bioschemas.org/ComputationalWorkflow though https://bioschemas.org/input doesn't resolve.(I came here from the RO-Crate project and Crate-O trying to sort out a bug we had related to this -- which arose from what I think is an error in our default context where input and output are linked to the ComputationalWorkflow page
But a colleague @alex-ip working using the Bioschemas Schema definition (linked above) has been using
https://discovery.biothings.io/view/bioschemas/input
. )Is there a standard context for bioschemas that includes all these terms as there is for schema.org and RO-Crate?
My colleague @alex-ip found this example: https://bio.tools/api/blast?format=jsonld
This is using the following @context definitions based on, I assume the assumption that the canonical URL for the schema is bioschemas.org:
But as noted above http://bioschemas.org/input does not resolve.
So, what are the IDs of these Classes and Properties?