ML-Schema with Neosemantics

Isha5 commented 2 years ago

Hi everyone, I find ML-Schema very interesting for the better interpretablitiy for Machine Learning projects. I am trying to use this schema for my ML models available on gitlab and query some information of these ML models.

I tried looking around but there are no blogs etc. on how to achieve this. I very much appreciate, if someone know any sources on how to achieve this.

joaquinvanschoren commented 2 years ago

There are no blogs about this as far as I know. I believe the paper gives some guidance: http://www.semantic-web-journal.net/content/ml-schema-interchangeable-format-description-machine-learning-experiments-0

A practical implementation to map models stored on OpenML to ML-Schema can be found here: https://github.com/ML-Schema/openml-rdf (not a lot of documentation there, though).

That's all I know...

Isha5 commented 2 years ago

Thank you @joaquinvanschoren

I read that we could produce .ttl (turtle files) from some apps like this

out of curiosity, in this github page, you have published WEKA ML model provenance in a turtle file, could you please share which tool did you use for that?

joaquinvanschoren commented 2 years ago

Probably best to ask Tommaso @mommi84

Isha5 commented 2 years ago

Hi Tommaso @mommi84, would be great if you could share info on this.

I read that we could produce .ttl (turtle files) from some apps like this

out of curiosity, in this github page, you have published WEKA logistic regression ML model provenance in a turtle file, could you please share which tool did you use for that?

Isha5 commented 2 years ago

@joaquinvanschoren @mommi84 @agnieszkalawrynowicz this is for my academic project. It would be helpful if you could let me know if I could map ML models available in my gitlab to mlschema. In this ReadMe it is given, we can load mlschema to protege and edit the schema. could you point me to any resources on how to edit this?

Thanks in advance professor @joaquinvanschoren and the team :)

diegoesteves commented 2 years ago

Hi @Isha5 there is no straightforward way to do so.

This is related to the long-standing issue we have in the trade-off between ML Frameworks x ML source-code. If you use the first, we'll probably get that structured information for free (as it's usually a standard feature in any decent framework). If you want to map ML metadata generated out of ML scripts outside such frameworks, there's currently no automatic way of doing so.

In the past I have explored a few different methods (everything open-source):

option 1: create your own ML framework with techniques such as interfaces/annotations/reflection - it also adds an extra layer, but code is cleaner this way. However, at that time the coverage wasn't great (worked just for toy examples due to a number of different reasons - read the paper if you want to know more) 2016, MEX-Interfaces: https://dl.acm.org/doi/10.1145/2993318.2993320

option 2: create a library that implements a logging mechanism to export those directly into a pre-defined format (e.g., MEX, OntoDM, Expose, whatsoever). Works decently, but at the cost of needlessly inflating the source code (from a purely ML point of view). 2017, LOG4MEX: A Library to Export Machine Learning Experiment http://jens-lehmann.org/files/2017/wi_log4mex.pdf

option 3: create a REST API to receive the ML (input/output) parameters and export the metadata file. Cleaner and my preferred option so far. Still, that requires adapting your source code to communicate with this web interface. 2017, An Interoperable Service for the Provenance of Machine Learning Experiments https://www.researchgate.net/profile/Diego-Esteves/publication/319051027_An_interoperable_service_for_the_provenance_of_machine_learning_experiments/links/59c17ca3a6fdcc69b92bc467/An-interoperable-service-for-the-provenance-of-machine-learning-experiments.pdf

1,2, and 3: https://github.com/mexplatform

option 4 4.1 - use an ML framework designed for that - if possible (e.g. OpenML https://www.openml.org/). Positive, you solve your problem. Negative: interoperability issues (source-code-wise).

4.2 - explore sequence2sequence methods to generate that automatically, without the need to inflate/adapt your source code. This could be a great research topic.

You just need to consider that there are so many new data platforms (including robust ones like Databricks, GCP, Azure, etc..) available where this (store/export ML feature/provenance) is just a very fundamental feature they provide. So check whether investing a lot of time recreating something from scratch makes sense to you.

Best, Diego.

mommi84 commented 2 years ago

out of curiosity, in this github page, you have published WEKA logistic regression ML model provenance in a turtle file, could you please share which tool did you use for that?

It was 6 years ago so I am just guessing, but I think that the examples were manually created on Protege as a proof of concept. As @diegoesteves pointed out, OpenML does provide a way to publish your Weka experiments (see https://docs.openml.org/Weka/) and export them to JSON, XML or RDF.

However, the RDF export of the single experiments is not complete and does not include all metadata (see for instance https://github.com/ML-Schema/openml-rdf/blob/master/examples/Run/476635.rdf).

ML-Schema / core

ML-Schema with Neosemantics #26