Request for new ontology MCRO

ProfTuan commented 2 years ago

Title

Model Card Report Ontology

Short Description

representational ontology for model card reports

Description

Model card reports are documents detailing transparent metadata information relating to machine learning models. Similar to what we have with drug labels and nutritional labels, the goal of model cards are to communicate relevant information on all aspects of a machine learning model that have undergone any experimentation. However these important reports of the machine learning models are presented in static documents. This work encodes the structure of model card reports and align them to standard OBO Foundry ontologies to help formalize and enrich these documents. The end result is computable model of the model card that can be used to standardize reporting and be integrated in future software tooling (searching and indexing, etc.).

Identifier Space

mcro

License

CC-BY 3.0

Domain

information technology

Contact Name

Tuan Amith

Contact Email

muhammad.amith@unt.edu

Contact GitHub Username

ProfTuan

Contact ORCID Identifier

0000-0003-4333-1857

Formats

[X] OWL RDF/XML (.owl)
[ ] OBO (.obo)
[ ] OBO Graph JSON (.json)

Dependencies

iao
swo
prov-o
skos

Usages

No response

Intended Use Cases and/or Related Projects

No response

Data Sources

No response

Additional comments or remarks

No response

OBO Foundry Pre-registration Checklist

[X] I have read and understood the registration process instructions and the registration checklist.
[X] There is no other ontology in the OBO Foundry which would be an appropriate place for my terms. If there were, I have contacted the editors, and we decided in mutual agreement that a separate ontology is more appropriate.
[X] My ontology has a specific release file with a version IRI and a dc:license annotation, serialised in RDF/XML.
[X] I understand that term definitions, while not mandatory, are key to understanding the intentions of a term especially when the ontology is used in curation. I made sure that a reasonable majority of terms in my ontology have definitions, using the IAO:0000115 property.
[X] For every term in my ontology, I checked whether another OBO Foundry ontology has one with the same meaning. If so, I re-used that term directly (not by cross-reference, by directly using the IRI).
[X] For all relationship properties (Object and Data Property), I checked whether the Relation Ontology (RO) includes an appropriate one. I understand that aligning with RO is an essential part of the overall alignment between OBO ontologies!
[X] For the selection of appropriate annotation properties, I looked at OMO first. I understand that aligning ontology metadata and term-level metadata is essential for cross-integration of OBO ontologies.
[X] If I was not sure about the meaning of any of the checkboxes above, I have consulted with a member of the OBO Foundry for advice, e.g., through the obo-discuss Google Group.
[X] The requested ID space does not conflict with another ID space found in other registries such as the Bioregistry and BioPortal, see here for a complete list.

cthoyt commented 2 years ago

Obvious prefix conflict with a different ontology already in OBO Foundry https://obofoundry.org/ontology/mco

ProfTuan commented 2 years ago

We apologize for the oversight. Changes have been made to address the conflict with the identifier.

matentzn commented 2 years ago

@UTH-Tuan thank you for your submission! It seems like MCRO is quite outside the usual scope of the OBO Foundry, but I have not done a review.

Maybe LOV is another venue you may be interested in.

In any case, if you still want to try to get accepted into OBO Foundry, we first have to ensure that your ontology passes minimum metadata requirements: https://obofoundry.org/obo-nor.github.io/dashboard/mcro/dashboard.html

You can ignore the "usages" red flag.

Only after all other issues are sorted will someone open your ontology and review it.

ProfTuan commented 2 years ago

Hi @matentzn, thank you for the response. Yes we are still interested in being accepted in OBO Foundry. We made some adjustments to the ontology to address the flags raised by the OBO dashboard. Please let us know if we missed anything. Thanks again.

matentzn commented 2 years ago

@UTH-Tuan thank you, looks good:

https://obofoundry.org/obo-nor.github.io/dashboard/mcro/dashboard.html

We will assign a reviewer shortly

cthoyt commented 2 years ago

There's a warning for the version IRI http://sbmi.uth.edu/ontology/mcro/1.0.0, it appears this page goes to a 404. Does the dashboard not make a ping to make sure that this IRI actually resolves to an ontology?

matentzn commented 2 years ago

Yes you are right, thank you @cthoyt, this is a requirement by the Versioning Principle - Dashboard does not currently check for this. Made an issue. @UTH-Tuan a human reviewer will tell you this as well when they are assigned - can you ensure that your version IRI resolves to the correct file as per principles?

ProfTuan commented 2 years ago

Yes you are right, thank you @cthoyt, this is a requirement by the Versioning Principle - Dashboard does not currently check for this. Made an issue. @UTH-Tuan a human reviewer will tell you this as well when they are assigned - can you ensure that your version IRI resolves to the correct file as per principles?

@matentzn and @cthoyt Thank you for pointing this out. We initially presumed we handled this, but misinterpreted the implementation of this important principle. The updated correction has been pushed to the repository.

matentzn commented 2 years ago

@UTH-Tuan we have assigned @shawntanzk to this ontology as your primary reviewer. The review process will likely take a while; I expect a final recommendation (either for or against admission) not before mid-June, as your case is a bit unusual for us (from a research domain perspective). Hope that is ok!

shawntanzk commented 2 years ago

Hi @UTH-Tuan thanks for your submission, here is my initial review of the ontology. Do note that more might come later.

MCRO seems to me like an application ontology that has very particular needs, as there are terms that should belong in a reference ontology instead (eg False Discover Rate, F1 Score, Area Under the Curve -> STATO; License information, reference information -> IAO)

note: this is not an issue per se, just a preface to some comments below.

Concerns:

Graphic (local) is equivalent to graph (IAO) - this firstly doesnt sound right (graphic is way broader than a graph by IAO definition), but even assuming graphic in MCRO is used in the way that graph is used in IAO, is there a reason to have that term?

Currently, subclasses of performance metric information is a bit tricky - things like F1 scores or false discovery rate are not documentation but rather actual data. For example F1 scores are a calculable scores that have a method of calculation etc. which means that it isn’t just part of a document, but rather the documentation captures the statistics. I can understand that this might be intended to use as a term for documentation of the F1 score for example, but the definition suggests that it refers to the score itself (The F1 score is the harmonic mean of the precision and recall.). This makes it hard to interoperate with other ontologies that might use these scores not as a subclass of documentation, but the actual scores themselves. This is especially clear with a term like accuracy which is also present in STATO (STATO:0000415) However, if this is highly specific for an application, it might not be too huge an issue.

Namespace: currently all namespaces are given as http://sbmi.uth.edu/ontology/mcro#Name This is not allowed under OBO guidelines (https://obofoundry.org/id-policy.html). “All OBO term IDs are CURIEs (prefixed identifiers) of the form IDSPACE:LOCALID”. This would prevent issues if changes need to happen (eg if label changes). This would also prevent mistakes with ID spaces being used (eg mcro:XXXXXXX = http://sbmi.uth.edu/ontology/mcro_XXXXXXX would prevent mistakes with https vs http). Also the current the ontology names terms forces both kebab and underscore, which can get very confusing.

Deprecated terms have no annotations on why they are deprecated and what should be used instead.

Minor/cosmetic issues:

Definitions don’t always seem to be a definition per se - eg evaluation_data_information def starts with: "Evaluation datasets should include datasets that are publicly available for third-party use.” and even reading the rest, I’m not quite sure.

Multiple definition: eg Out_of_Scope_Use_case, use_case_information - there should only be one definition in each term.

There are terms that have no or lacking annotations (importantly no definition): eg Precision-Recall_Curve, False_Ommission_Rate

cthoyt commented 2 years ago

@shawntanzk @matentzn some of these checks should be incorporated in the OBO Dashboard, seems like they can be automated and reduce curation burden

ProfTuan commented 2 years ago

@shawntanzk Thanks for the initial feedback. We can address most of these within the next day or two with an updated version if that's ok.

shawntanzk commented 2 years ago

Thanks, please do ping me here with the replies when the updated version is ready :)

ProfTuan commented 2 years ago

@shawntanzk, we have released an updated version that addresses some of the initial feedback (redefined "graph", assigned proper iri, definitions, etc). If by any chance there is a need for further updates we are open to making further adjustments.

Just for some background information, the project was started when we noticed that NIH had mentioned an interest in model card reports for AI-based machine learning models in their bridge2AI initiative. So while the ontology resource can be used for application purposes, it serves as way to standardize and formalize model card reporting documents. The comments about the performance metric subclasses are actually document sections not the actual metric. We could import STATO later and link the metric subsections (e.g. "accuracy section" > is about > stato:accuracy). Overall the ontology just represents report documents.

Please let us know if there's any additional issues and we'll gladly address them.

shawntanzk commented 2 years ago

@UTH-Tuan Thanks for the reply. Just a heads up, I'm on leave for a couple of weeks so this might take a while for me to get to, sorry for the delay.

shawntanzk commented 2 years ago

Comments:

Changes Made The changes made are sensible and makes the ontology more interoperable and clear, especially narrowing terms down to be specifically stated as "information section". The changes made to ID as http://sbmi.uth.edu/ontology/mcro#mcro_XXXXXX is also a good step - though I would recommend using 7 digits instead of 6 to be more conformant to OBO ontologies

Deprecation I do not see any more deprecated terms in the ontology which is fine for now as there doesn't seem to be a release yet and the ontology doesn't seem to be used externally at all. However, in future, obsoletion should be done instead of term removal.

Release version control Could you comment on how MCRO does its releases? This is important under versions of ontology in https://obofoundry.org/id-policy.html Please also see https://obofoundry.org/principles/fp-004-versioning.html Currently, I do not see releases in the github repo.

intersection probably wrongly expressed Example: Model Parameter Section has the subclass_of axiom:

'has part' some 
('Dataset Information Section' and 'Input Format Information Section' and 'Model Architecture Information Section' and 'Output Format Information Section')

in owl that would = a single thing fulfils all the sections. I'm guessing you mean:

'has part' some 'Dataset Information Section'
'has part' some 'Input Format Information Section'
'has part' some 'Model Architecture Information Section'
'has part' some 'Output Format Information Section'

There are a few similar cases that I'm guessing all should be changed accordingly

unsats Currently there are two unsats (under owl:Nothing) - they are SWO terms. These need to be resolved. Ideally, there should also be automated QC checks to ensure there are no unsats in releases.

use case I'm trying to understand what your use case of the ontology is - looking at the use of data properties, are you building an ontology that is used as a data model, perhaps for validation? If so, have you considered technologies like SHACL instead that might allow more complex shapes to be built for validation purposes?

shawntanzk commented 2 years ago

Hi @UTH-Tuan - I have brought this up for discussion with the foundry committee and here are some comments:

Scope Currently MCRO seems out of scope for OBO foundry. While we do understand that there is a need to unify documentation parts, including things like information section in model cards, the way that MCRO handles it using data properties to restrict the input, seems to be more of a data model specific to their use case. A couple of key points here: 1) If the use case was to unify these information section, terms might instead belong to IAO rather than a new ontology 2) If these terms were to be reused, many of the restrictions are too specific for general use, and using JSON, SHACL, or some constraint language might be more appropriate then embedding it in the ontology 3) OBO Foundry is more focused on "reality" side and not so much on "information" side of things (eg interested more in F-score statistics than documentation about F-score statistics)

use case We do however understand that you might have reasons for MCRO to be included in the OBO foundry, we hence would invite you to write back to us on the use cases of this ontology, and how you think it would be useful to the community.

ProfTuan commented 2 years ago

Hi @shawntanzk, I am in the midst of addressing your points including a description of the use case. I will have a response by later this week.

nlharris commented 2 years ago

Awaiting your response, @UTH-Tuan!

ProfTuan commented 2 years ago

I apologize for the delay. We may need extra week due to scheduling to add the changes to artifact.

nlharris commented 2 years ago

No problem, take your time!

addiehl commented 1 year ago

Still awaiting your response.

ProfTuan commented 1 year ago

Sorry for the late response. This fell off my radar.

ProfTuan commented 1 year ago

The changes made to ID as http://sbmi.uth.edu/ontology/mcro#mcro_XXXXXX is also a good step - though I would recommend using 7 digits instead of 6 to be more conformant to OBO ontologies

We have adjusted this to 7 digits to be in conformity with OBO expectations.

intersection probably wrongly expressed

We have changed intersection concern to what was suggested by Dr. Tan

Could you comment on how MCRO does its releases? This is important under versions of ontology in https://obofoundry.org/id-policy.html Please also see https://obofoundry.org/principles/fp-004-versioning.html Currently, I do not see releases in the github repo.

We have uploaded the latest version in the release section. As far versioning, if I am not mistaken we do not have a current PURL. My assumption is that once we attain a PURL identifier we will adhere to versioning policy of OBO.

use case I'm trying to understand what your use case of the ontology is - looking at the use of data properties, are you building an ontology that is used as a data model, perhaps for validation? If so, have you considered technologies like SHACL instead that might allow more complex shapes to be built for validation purposes? ..... Scope ... use case

I think looking at the data properites is misleading. We are not trying to validate data, as so much just to provide a template for model card reports using an ontology-based framework to formalize these types of documents for biomedical informatics research that utilize AI-based machine learning. I also feel that use case question might be a bit unfair. Without naming specific projects there are one or two ontologies that are currently on OBO which are clearly application ontologies.

We do have a paper that might clarify the scope and use case:

Amith, M.T., Cui, L., Zhi, D. et al. Toward a standard formal semantic representation of the model card report. BMC Bioinformatics 23 (Suppl 6), 281 (2022). https://doi.org/10.1186/s12859-022-04797-6

unsats Currently there are two unsats (under owl:Nothing) - they are SWO terms. These need to be resolved. Ideally, there should also be automated QC checks to ensure there are no unsats in releases.

This has been fixed within our ontology. This issue was a result of conflicting terms between IAO and SWO. See this: https://github.com/allysonlister/swo/issues/56

shawntanzk commented 1 year ago

Thanks for the reply, we will bring this up in the next call again, but in general the changes made are good and have fixed my main concerns. The following is a response to the reply (more for use with the OFOC call - I'm not 100% sure I can make it for the next call as I am in the process of moving countries at the moment)

We have adjusted this to 7 digits to be in conformity with OBO expectations.

I've seen the change in the ontology and it looks good, thanks.

We have changed intersection concern to what was suggested by Dr. Tan

The change is sensible and fixes my concerns.

We have uploaded the latest version in the release section. As far versioning, if I am not mistaken we do not have a current PURL. My assumption is that once we attain a PURL identifier we will adhere to versioning policy of OBO.

I see that you have uploaded the current version as a release in your GitHub. I also note that there is a owl:versioninfo on the ontology metadata. I would just like to confirm that with each iteration of the ontology, you will be creating a github release PURL identifiers are just a redirect and hence the versioning should be implemented regardless of it - but as mentioned above, I see that you have done github release now.

We are not trying to validate data, as so much just to provide a template for model card reports using an ontology-based framework to formalize these types of documents for biomedical informatics research that utilize AI-based machine learning. I also feel that use case question might be a bit unfair. Without naming specific projects there are one or two ontologies that are currently on OBO which are clearly application ontologies.

We do have a paper that might clarify the scope and use case:

Amith, M.T., Cui, L., Zhi, D. et al. Toward a standard formal semantic representation of the model card report. BMC Bioinformatics 23 (Suppl 6), 281 (2022). https://doi.org/10.1186/s12859-022-04797-6

Thanks for the reply and the link to the paper. We will bring this up in the next call and will let you know if there are other concerns about this.

This has been fixed within our ontology. This issue was a result of conflicting terms between IAO and SWO. See this: https://github.com/allysonlister/swo/issues/56

Thanks for the reply, I see that this has been fixed.

shawntanzk commented 1 year ago

Hi @ProfTuan

Thanks for the changes and the replies and your patience. We have discussed this and are happy to provisionally accept this ontology into the OBO Foundry A few things we need you to do first though:

1) Purls We need you to change the IDs of all your terms to the following format: http://purl.obolibrary.org/obo/mcro_XXXXXXX

2) Versioning We need you to confirm that you will continue versioning (e.g. the old versions will be available if people want to use those), I assume you are doing this through GitHub releases?

Once these changes have been made and a new release is done, let us know here and we will inform you of the following steps.

Thanks

matentzn commented 1 year ago

(http://purl.obolibrary.org/obo/MCRO_XXXXXXX, not http://purl.obolibrary.org/obo/mcro_XXXXXXX, with capital letters in the ontology part)

ProfTuan commented 1 year ago

@shawntanzk and @matentzn

We want to acknowledge your response and thank you for the inclusion of our work into OBO. We have a paper (software) that we were waiting on for rejection or acceptance status. So we didn't want to break anything till the review process is over.

In response to 2

Versioning We need you to confirm that you will continue versioning (e.g. the old versions will be available if people want to use those), I assume you are doing this through GitHub releases?

Absolutely. Following the same protocol as other OBO foundry ontologies, we intend to archive old versions through GitHub release' tab.

Purls We need you to change the IDs of all your terms to the following format: http://purl.obolibrary.org/obo/mcro_XXXXXXX

(http://purl.obolibrary.org/obo/MCRO_XXXXXXX, not http://purl.obolibrary.org/obo/mcro_XXXXXXX, with capital letters in the ontology part)

We are working on those right now since the review for our aforementioned paper is completed.

ProfTuan commented 1 year ago

The IDs of the terms has been changed to PURLs.

We didn't update the Ontology Version IRI yet. The reason being is we have one work in progress project that uses the current IRI. Which brings me to my question, what is the next step?

shawntanzk commented 1 year ago

@ProfTuan Thanks for making the changes, I see that all is ready to go now. Here are few things you need to handle to finalize MCRO entry into OBO Foundry:

Make a pull request analogous to this one: https://github.com/OBOFoundry/OBOFoundry.github.io/pull/1673/files
Create a file like https://github.com/OBOFoundry/purl.obolibrary.org/blob/master/config/aism.yml and make a pull request

If you have any questions, please fell free to message here pinging me.

Thank you

ProfTuan commented 1 year ago

Is there a specific branch to make the pull request?

shawntanzk commented 1 year ago

You can make a branch/fork and a PR :) thanks

ProfTuan commented 1 year ago

Thanks @shawntanzk we're working on it.

cmungall commented 1 year ago

Do we have a dashboard check for naming conventions specifically capitalization?

See https://obofoundry.org/principles/fp-012-naming-conventions.html

MCRO naming conventions are different which is jarring when looking at two ontologies together:

[] IAO:0000030 ! information content entity
- [i] IAO:0000314 ! document part
  - [i] obo:MCRO_0000027 ! Performance Metric Information Section
    - [i] obo:MCRO_0000002 ! Accuracy Information Section

cmungall commented 1 year ago

As @shawntanzk mentioned, this employs fairly different modeling paradigms from the rest of OBO. This is not necessarily bad, but I just want to note this here as a precedent.

For example, Model Card Report has axioms:

'has part' only 'Consideration Information Section'
'has part' only 'Model Detail Section'
'has part' only 'Model Parameter Section'
'has part' only 'Quantative Analysis Section'
'is about' some algorithm

This means that every model card report has a Consideration Information Section.

a Consideration Information Section has axioms

'document part'
'has part' some 'Ethical Consideration Section'
'has part' some 'Limitation Information Section'
'has part' some 'Trade-off Information Section'
'has part' some 'Use Case Information Section'
'has part' some 'User Information Section'
documentation some rdfs:Literal
overview some rdfs:Literal

so we are saying that every model card has a consideration section and that section necessarily has all these parts.

I don't think this is an accurate representation of reality. See for example:

https://huggingface.co/gpt2

It looks also like you are modeling things like documentation as data properties. This is quite common for many semantic web vocabularies but is a bit of a mismatch from the rest of OBO.

I'm not really sure OWL and an ontology is the right formalism for what it is you want to do.

None of my comments should be interpreted as a blocker for accepting this ontology. But I will note that OWL works best for a certain style of ontology. This product looks like a great schema.org style vocabulary and I think everyone would be better served modeling as an extension of schema.org using schema.org style modeling, with loose mappings to OBO.

The OBO Ops team may want to find a different axis from reference vs application ontology here to better categorize semantic web style ontologies.

UPDATE apologies part of the specifics of my analysis is wrong as I read "only" as "some". @balhoff's analysis is correct however

ProfTuan commented 1 year ago

I get what you're saying, but I think what ontologies could do and what we are trying to aim for is standardizing the structure of model card reports where some reports could potentially neglect pieces of information that should be shared.

balhoff commented 1 year ago

@ProfTuan there are logical issues with using only restrictions in this way. The Model Card Report axioms will result in all of its parts having all these types at once. I would assume that a given section instance is not a Consideration Information Section and a Quantitative Analysis Section at the same time. These types will also apply to all the subparts of the parts of the report. In general it is inadvisable to use a transitive property in an only restriction (especially when the property has a much broader range than the usage in the restriction).

Something you could do instead would be:

'Model Card Report' SubClassOf ('has part' only ('Consideration Information Section' or 'Model Detail Section' or 'Model Parameter Section' or 'Quantitative Analysis Section'))

But again I wouldn't use has part there due to its transitivity. A non-transitive subproperty of has part would be appropriate.

ProfTuan commented 1 year ago

Thank you for the input, @balhoff. I'll look into suggestion.

I do want to add that representation does take a cue from IAO. If you notice "document" or "document part" from IAO uses 'has part' and 'part of' (i.e., document-> has part -> document part). I presume the representational structure is no different than how IAO is abstracting document and document part. "Model Card Report" is subtype of report which is also a subtype of document. The other sections of the model card report are subtype of document part.

balhoff commented 1 year ago

@ProfTuan I do think 'has part' is correct for relating these items. It just becomes a problem when using it in the only axioms.

cmungall commented 1 year ago

apologies, part of my analysis was wrong as I read only as some. @balhoff's analysis on the use of only is correct, there are multiple traps here.

My comments about Consideration Information Section still stand though:

'document part'
'has part' some 'Ethical Consideration Section'
'has part' some 'Limitation Information Section'
'has part' some 'Trade-off Information Section'
'has part' some 'Use Case Information Section'
'has part' some 'User Information Section'
documentation some rdfs:Literal
overview some rdfs:Literal

you are saying that every Consideration Information Section must have all these sections, this is not true for most model cards on e.g huggingface

OBOFoundry / OBOFoundry.github.io