Closed ProfTuan closed 1 year ago
Obvious prefix conflict with a different ontology already in OBO Foundry https://obofoundry.org/ontology/mco
We apologize for the oversight. Changes have been made to address the conflict with the identifier.
@UTH-Tuan thank you for your submission! It seems like MCRO is quite outside the usual scope of the OBO Foundry, but I have not done a review.
Maybe LOV is another venue you may be interested in.
In any case, if you still want to try to get accepted into OBO Foundry, we first have to ensure that your ontology passes minimum metadata requirements: https://obofoundry.org/obo-nor.github.io/dashboard/mcro/dashboard.html
You can ignore the "usages" red flag.
Only after all other issues are sorted will someone open your ontology and review it.
Hi @matentzn, thank you for the response. Yes we are still interested in being accepted in OBO Foundry. We made some adjustments to the ontology to address the flags raised by the OBO dashboard. Please let us know if we missed anything. Thanks again.
@UTH-Tuan thank you, looks good:
https://obofoundry.org/obo-nor.github.io/dashboard/mcro/dashboard.html
We will assign a reviewer shortly
There's a warning for the version IRI http://sbmi.uth.edu/ontology/mcro/1.0.0, it appears this page goes to a 404. Does the dashboard not make a ping to make sure that this IRI actually resolves to an ontology?
Yes you are right, thank you @cthoyt, this is a requirement by the Versioning Principle - Dashboard does not currently check for this. Made an issue. @UTH-Tuan a human reviewer will tell you this as well when they are assigned - can you ensure that your version IRI resolves to the correct file as per principles?
Yes you are right, thank you @cthoyt, this is a requirement by the Versioning Principle - Dashboard does not currently check for this. Made an issue. @UTH-Tuan a human reviewer will tell you this as well when they are assigned - can you ensure that your version IRI resolves to the correct file as per principles?
@matentzn and @cthoyt Thank you for pointing this out. We initially presumed we handled this, but misinterpreted the implementation of this important principle. The updated correction has been pushed to the repository.
@UTH-Tuan we have assigned @shawntanzk to this ontology as your primary reviewer. The review process will likely take a while; I expect a final recommendation (either for or against admission) not before mid-June, as your case is a bit unusual for us (from a research domain perspective). Hope that is ok!
Hi @UTH-Tuan thanks for your submission, here is my initial review of the ontology. Do note that more might come later.
MCRO seems to me like an application ontology that has very particular needs, as there are terms that should belong in a reference ontology instead (eg False Discover Rate, F1 Score, Area Under the Curve -> STATO; License information, reference information -> IAO)
note: this is not an issue per se, just a preface to some comments below.
Concerns:
Graphic (local) is equivalent to graph (IAO) - this firstly doesnt sound right (graphic is way broader than a graph by IAO definition), but even assuming graphic in MCRO is used in the way that graph is used in IAO, is there a reason to have that term?
Currently, subclasses of performance metric information is a bit tricky - things like F1 scores or false discovery rate are not documentation but rather actual data. For example F1 scores are a calculable scores that have a method of calculation etc. which means that it isn’t just part of a document, but rather the documentation captures the statistics. I can understand that this might be intended to use as a term for documentation of the F1 score for example, but the definition suggests that it refers to the score itself (The F1 score is the harmonic mean of the precision and recall.). This makes it hard to interoperate with other ontologies that might use these scores not as a subclass of documentation, but the actual scores themselves. This is especially clear with a term like accuracy which is also present in STATO (STATO:0000415) However, if this is highly specific for an application, it might not be too huge an issue.
Namespace: currently all namespaces are given as http://sbmi.uth.edu/ontology/mcro#Name This is not allowed under OBO guidelines (https://obofoundry.org/id-policy.html). “All OBO term IDs are CURIEs (prefixed identifiers) of the form IDSPACE:LOCALID”. This would prevent issues if changes need to happen (eg if label changes). This would also prevent mistakes with ID spaces being used (eg mcro:XXXXXXX = http://sbmi.uth.edu/ontology/mcro_XXXXXXX would prevent mistakes with https vs http). Also the current the ontology names terms forces both kebab and underscore, which can get very confusing.
Deprecated terms have no annotations on why they are deprecated and what should be used instead.
Minor/cosmetic issues:
Definitions don’t always seem to be a definition per se - eg evaluation_data_information def starts with: "Evaluation datasets should include datasets that are publicly available for third-party use.” and even reading the rest, I’m not quite sure.
Multiple definition: eg Out_of_Scope_Use_case, use_case_information - there should only be one definition in each term.
There are terms that have no or lacking annotations (importantly no definition): eg Precision-Recall_Curve, False_Ommission_Rate
@shawntanzk @matentzn some of these checks should be incorporated in the OBO Dashboard, seems like they can be automated and reduce curation burden
@shawntanzk Thanks for the initial feedback. We can address most of these within the next day or two with an updated version if that's ok.
Thanks, please do ping me here with the replies when the updated version is ready :)
@shawntanzk, we have released an updated version that addresses some of the initial feedback (redefined "graph", assigned proper iri, definitions, etc). If by any chance there is a need for further updates we are open to making further adjustments.
Just for some background information, the project was started when we noticed that NIH had mentioned an interest in model card reports for AI-based machine learning models in their bridge2AI initiative. So while the ontology resource can be used for application purposes, it serves as way to standardize and formalize model card reporting documents. The comments about the performance metric subclasses are actually document sections not the actual metric. We could import STATO later and link the metric subsections (e.g. "accuracy section" > is about > stato:accuracy). Overall the ontology just represents report documents.
Please let us know if there's any additional issues and we'll gladly address them.
@UTH-Tuan Thanks for the reply. Just a heads up, I'm on leave for a couple of weeks so this might take a while for me to get to, sorry for the delay.
Comments:
Changes Made The changes made are sensible and makes the ontology more interoperable and clear, especially narrowing terms down to be specifically stated as "information section". The changes made to ID as http://sbmi.uth.edu/ontology/mcro#mcro_XXXXXX is also a good step - though I would recommend using 7 digits instead of 6 to be more conformant to OBO ontologies
Deprecation I do not see any more deprecated terms in the ontology which is fine for now as there doesn't seem to be a release yet and the ontology doesn't seem to be used externally at all. However, in future, obsoletion should be done instead of term removal.
Release version control Could you comment on how MCRO does its releases? This is important under versions of ontology in https://obofoundry.org/id-policy.html Please also see https://obofoundry.org/principles/fp-004-versioning.html Currently, I do not see releases in the github repo.
intersection probably wrongly expressed Example: Model Parameter Section has the subclass_of axiom:
'has part' some
('Dataset Information Section' and 'Input Format Information Section' and 'Model Architecture Information Section' and 'Output Format Information Section')
in owl that would = a single thing fulfils all the sections. I'm guessing you mean:
'has part' some 'Dataset Information Section'
'has part' some 'Input Format Information Section'
'has part' some 'Model Architecture Information Section'
'has part' some 'Output Format Information Section'
There are a few similar cases that I'm guessing all should be changed accordingly
unsats Currently there are two unsats (under owl:Nothing) - they are SWO terms. These need to be resolved. Ideally, there should also be automated QC checks to ensure there are no unsats in releases.
use case I'm trying to understand what your use case of the ontology is - looking at the use of data properties, are you building an ontology that is used as a data model, perhaps for validation? If so, have you considered technologies like SHACL instead that might allow more complex shapes to be built for validation purposes?
Hi @UTH-Tuan - I have brought this up for discussion with the foundry committee and here are some comments:
Scope Currently MCRO seems out of scope for OBO foundry. While we do understand that there is a need to unify documentation parts, including things like information section in model cards, the way that MCRO handles it using data properties to restrict the input, seems to be more of a data model specific to their use case. A couple of key points here: 1) If the use case was to unify these information section, terms might instead belong to IAO rather than a new ontology 2) If these terms were to be reused, many of the restrictions are too specific for general use, and using JSON, SHACL, or some constraint language might be more appropriate then embedding it in the ontology 3) OBO Foundry is more focused on "reality" side and not so much on "information" side of things (eg interested more in F-score statistics than documentation about F-score statistics)
use case We do however understand that you might have reasons for MCRO to be included in the OBO foundry, we hence would invite you to write back to us on the use cases of this ontology, and how you think it would be useful to the community.
Hi @shawntanzk, I am in the midst of addressing your points including a description of the use case. I will have a response by later this week.
Awaiting your response, @UTH-Tuan!
I apologize for the delay. We may need extra week due to scheduling to add the changes to artifact.
No problem, take your time!
Still awaiting your response.
Sorry for the late response. This fell off my radar.
The changes made to ID as http://sbmi.uth.edu/ontology/mcro#mcro_XXXXXX is also a good step - though I would recommend using 7 digits instead of 6 to be more conformant to OBO ontologies
We have adjusted this to 7 digits to be in conformity with OBO expectations.
intersection probably wrongly expressed
We have changed intersection concern to what was suggested by Dr. Tan
Could you comment on how MCRO does its releases? This is important under versions of ontology in https://obofoundry.org/id-policy.html Please also see https://obofoundry.org/principles/fp-004-versioning.html Currently, I do not see releases in the github repo.
We have uploaded the latest version in the release section. As far versioning, if I am not mistaken we do not have a current PURL. My assumption is that once we attain a PURL identifier we will adhere to versioning policy of OBO.
use case I'm trying to understand what your use case of the ontology is - looking at the use of data properties, are you building an ontology that is used as a data model, perhaps for validation? If so, have you considered technologies like SHACL instead that might allow more complex shapes to be built for validation purposes? ..... Scope ... use case
I think looking at the data properites is misleading. We are not trying to validate data, as so much just to provide a template for model card reports using an ontology-based framework to formalize these types of documents for biomedical informatics research that utilize AI-based machine learning. I also feel that use case question might be a bit unfair. Without naming specific projects there are one or two ontologies that are currently on OBO which are clearly application ontologies.
We do have a paper that might clarify the scope and use case:
Amith, M.T., Cui, L., Zhi, D. et al. Toward a standard formal semantic representation of the model card report. BMC Bioinformatics 23 (Suppl 6), 281 (2022). https://doi.org/10.1186/s12859-022-04797-6
unsats Currently there are two unsats (under owl:Nothing) - they are SWO terms. These need to be resolved. Ideally, there should also be automated QC checks to ensure there are no unsats in releases.
This has been fixed within our ontology. This issue was a result of conflicting terms between IAO and SWO. See this: https://github.com/allysonlister/swo/issues/56
Thanks for the reply, we will bring this up in the next call again, but in general the changes made are good and have fixed my main concerns. The following is a response to the reply (more for use with the OFOC call - I'm not 100% sure I can make it for the next call as I am in the process of moving countries at the moment)
We have adjusted this to 7 digits to be in conformity with OBO expectations.
I've seen the change in the ontology and it looks good, thanks.
We have changed intersection concern to what was suggested by Dr. Tan
The change is sensible and fixes my concerns.
We have uploaded the latest version in the release section. As far versioning, if I am not mistaken we do not have a current PURL. My assumption is that once we attain a PURL identifier we will adhere to versioning policy of OBO.
I see that you have uploaded the current version as a release in your GitHub. I also note that there is a owl:versioninfo on the ontology metadata. I would just like to confirm that with each iteration of the ontology, you will be creating a github release PURL identifiers are just a redirect and hence the versioning should be implemented regardless of it - but as mentioned above, I see that you have done github release now.
We are not trying to validate data, as so much just to provide a template for model card reports using an ontology-based framework to formalize these types of documents for biomedical informatics research that utilize AI-based machine learning. I also feel that use case question might be a bit unfair. Without naming specific projects there are one or two ontologies that are currently on OBO which are clearly application ontologies.
We do have a paper that might clarify the scope and use case:
Amith, M.T., Cui, L., Zhi, D. et al. Toward a standard formal semantic representation of the model card report. BMC Bioinformatics 23 (Suppl 6), 281 (2022). https://doi.org/10.1186/s12859-022-04797-6
Thanks for the reply and the link to the paper. We will bring this up in the next call and will let you know if there are other concerns about this.
This has been fixed within our ontology. This issue was a result of conflicting terms between IAO and SWO. See this: https://github.com/allysonlister/swo/issues/56
Thanks for the reply, I see that this has been fixed.
Hi @ProfTuan
Thanks for the changes and the replies and your patience. We have discussed this and are happy to provisionally accept this ontology into the OBO Foundry A few things we need you to do first though:
1) Purls We need you to change the IDs of all your terms to the following format: http://purl.obolibrary.org/obo/mcro_XXXXXXX
2) Versioning We need you to confirm that you will continue versioning (e.g. the old versions will be available if people want to use those), I assume you are doing this through GitHub releases?
Once these changes have been made and a new release is done, let us know here and we will inform you of the following steps.
Thanks
(http://purl.obolibrary.org/obo/MCRO_XXXXXXX, not http://purl.obolibrary.org/obo/mcro_XXXXXXX, with capital letters in the ontology part)
@shawntanzk and @matentzn
We want to acknowledge your response and thank you for the inclusion of our work into OBO. We have a paper (software) that we were waiting on for rejection or acceptance status. So we didn't want to break anything till the review process is over.
In response to 2
Versioning We need you to confirm that you will continue versioning (e.g. the old versions will be available if people want to use those), I assume you are doing this through GitHub releases?
Absolutely. Following the same protocol as other OBO foundry ontologies, we intend to archive old versions through GitHub release' tab.
Purls We need you to change the IDs of all your terms to the following format: http://purl.obolibrary.org/obo/mcro_XXXXXXX
(http://purl.obolibrary.org/obo/MCRO_XXXXXXX, not http://purl.obolibrary.org/obo/mcro_XXXXXXX, with capital letters in the ontology part)
We are working on those right now since the review for our aforementioned paper is completed.
The IDs of the terms has been changed to PURLs.
We didn't update the Ontology Version IRI yet. The reason being is we have one work in progress project that uses the current IRI. Which brings me to my question, what is the next step?
@ProfTuan Thanks for making the changes, I see that all is ready to go now. Here are few things you need to handle to finalize MCRO entry into OBO Foundry:
If you have any questions, please fell free to message here pinging me.
Thank you
Is there a specific branch to make the pull request?
You can make a branch/fork and a PR :) thanks
Thanks @shawntanzk we're working on it.
Do we have a dashboard check for naming conventions specifically capitalization?
See https://obofoundry.org/principles/fp-012-naming-conventions.html
MCRO naming conventions are different which is jarring when looking at two ontologies together:
As @shawntanzk mentioned, this employs fairly different modeling paradigms from the rest of OBO. This is not necessarily bad, but I just want to note this here as a precedent.
For example, Model Card Report
has axioms:
'has part' only 'Consideration Information Section'
'has part' only 'Model Detail Section'
'has part' only 'Model Parameter Section'
'has part' only 'Quantative Analysis Section'
'is about' some algorithm
This means that every model card report has a Consideration Information Section.
a Consideration Information Section has axioms
'document part'
'has part' some 'Ethical Consideration Section'
'has part' some 'Limitation Information Section'
'has part' some 'Trade-off Information Section'
'has part' some 'Use Case Information Section'
'has part' some 'User Information Section'
documentation some rdfs:Literal
overview some rdfs:Literal
so we are saying that every model card has a consideration section and that section necessarily has all these parts.
I don't think this is an accurate representation of reality. See for example:
It looks also like you are modeling things like documentation as data properties. This is quite common for many semantic web vocabularies but is a bit of a mismatch from the rest of OBO.
I'm not really sure OWL and an ontology is the right formalism for what it is you want to do.
None of my comments should be interpreted as a blocker for accepting this ontology. But I will note that OWL works best for a certain style of ontology. This product looks like a great schema.org style vocabulary and I think everyone would be better served modeling as an extension of schema.org using schema.org style modeling, with loose mappings to OBO.
The OBO Ops team may want to find a different axis from reference vs application ontology here to better categorize semantic web style ontologies.
UPDATE apologies part of the specifics of my analysis is wrong as I read "only" as "some". @balhoff's analysis is correct however
I get what you're saying, but I think what ontologies could do and what we are trying to aim for is standardizing the structure of model card reports where some reports could potentially neglect pieces of information that should be shared.
@ProfTuan there are logical issues with using only
restrictions in this way. The Model Card Report
axioms will result in all of its parts having all these types at once. I would assume that a given section instance is not a Consideration Information Section
and a Quantitative Analysis Section
at the same time. These types will also apply to all the subparts of the parts of the report. In general it is inadvisable to use a transitive property in an only
restriction (especially when the property has a much broader range than the usage in the restriction).
Something you could do instead would be:
'Model Card Report' SubClassOf ('has part' only ('Consideration Information Section' or 'Model Detail Section' or 'Model Parameter Section' or 'Quantitative Analysis Section'))
But again I wouldn't use has part
there due to its transitivity. A non-transitive subproperty of has part
would be appropriate.
Thank you for the input, @balhoff. I'll look into suggestion.
I do want to add that representation does take a cue from IAO. If you notice "document" or "document part" from IAO uses 'has part' and 'part of' (i.e., document-> has part -> document part). I presume the representational structure is no different than how IAO is abstracting document and document part. "Model Card Report" is subtype of report which is also a subtype of document. The other sections of the model card report are subtype of document part.
@ProfTuan I do think 'has part' is correct for relating these items. It just becomes a problem when using it in the only
axioms.
apologies, part of my analysis was wrong as I read only
as some
. @balhoff's analysis on the use of only
is correct, there are multiple traps here.
My comments about Consideration Information Section
still stand though:
'document part'
'has part' some 'Ethical Consideration Section'
'has part' some 'Limitation Information Section'
'has part' some 'Trade-off Information Section'
'has part' some 'Use Case Information Section'
'has part' some 'User Information Section'
documentation some rdfs:Literal
overview some rdfs:Literal
you are saying that every Consideration Information Section
must have all these sections, this is not true for most model cards on e.g huggingface
Title
Model Card Report Ontology
Short Description
representational ontology for model card reports
Description
Model card reports are documents detailing transparent metadata information relating to machine learning models. Similar to what we have with drug labels and nutritional labels, the goal of model cards are to communicate relevant information on all aspects of a machine learning model that have undergone any experimentation. However these important reports of the machine learning models are presented in static documents. This work encodes the structure of model card reports and align them to standard OBO Foundry ontologies to help formalize and enrich these documents. The end result is computable model of the model card that can be used to standardize reporting and be integrated in future software tooling (searching and indexing, etc.).
Identifier Space
mcro
License
CC-BY 3.0
Domain
information technology
Source Code Repository
https://github.com/UTHealth-Ontology/MCRO
Homepage
https://github.com/UTHealth-Ontology/MCRO
Issue Tracker
https://github.com/UTHealth-Ontology/MCRO/issues
Ontology Download Link
https://raw.githubusercontent.com/UTHealth-Ontology/MCRO/main/mcro.owl
Contact Name
Tuan Amith
Contact Email
muhammad.amith@unt.edu
Contact GitHub Username
ProfTuan
Contact ORCID Identifier
0000-0003-4333-1857
Formats
Dependencies
Related
No response
Usages
No response
Intended Use Cases and/or Related Projects
No response
Data Sources
No response
Additional comments or remarks
No response
OBO Foundry Pre-registration Checklist
dc:license
annotation, serialised in RDF/XML.