ML-Schema / core

📚 CORE ontology of ML-Schema and mapping to other machine learning vocabularies and ontologies (DMOP, Exposé, OntoDM, and MEX)
http://purl.org/mls
26 stars 7 forks source link

Propose top-level concepts #2

Open joaquinvanschoren opened 8 years ago

joaquinvanschoren commented 8 years ago

We need to start with a list of top-level concepts. Please add your suggestions.

agnieszkalawrynowicz commented 8 years ago

DM-Task DM-Algorithm DM-Operator / DM-AlgorithmImplementation DM-Operation / DM-AlgorithmExecution DM-Data DM-Hypothesis / DM-Generalization (with subclasses: DM-Model, DM-PatternSet) DM-Workflow <-- specification DM-Process <--execution DM-Experiment <-- something that resembles a bundle in PROV i.e. prov:Bundle

coredmopconcepts

joaquinvanschoren commented 8 years ago

I made an effort to 'reconcile' the different ML ontologies a few years ago. I made this figure of the top-level concepts in use then, maybe this is useful. http://www.yumpu.com/en/document/view/17755780/expose-overview-and-use-cases

joaquinvanschoren commented 8 years ago

Regarding naming: do we really need prefixes such as DM-Algorithm? This creates a naming issue in itself: is it a data mining / machine learning / modelling algorithm? If we do use them, let's use ML because we are building an ML Schema :)

agnieszkalawrynowicz commented 8 years ago

I was thinking more in terms of particular concepts (their meaning and not their names). Surely, we could get rid of the 'DM-' prefix or alternatively use 'ML-'. Th former might be less problematic.

panovp commented 8 years ago

I agree with Joaquin, since we are trying to describe a schema for ML we should indeed avoid using the prefixes in the terms that will be defined in the schema. For that purpose we have the namespace. No need to duplicate also in the terms themselves.

panovp commented 8 years ago

Here are the things we defined as "core" entities in OntoDM to discuss in the call. fig2.pdf

diegoesteves commented 8 years ago

Here are MEX entities. It has been designed to cover ML executions (single and set of exec) as simple as possible (algorithm + hyper parameters + features + data + basic infos => measures). Not all of them are necessary for a complete representation though. DM concepts are not strongly defined. 687474703a2f2f646e65352e636f6d2f6d65782f6469616772616d2f6d65782d312e302e312e706e67

joaquinvanschoren commented 8 years ago

In the Hangout call, we agreed that we will put these on a wiki page (on github) so that we can compare them. Ideally we create a matrix of how the same concepts are called in the different ontologies/vocabularies.

panovp commented 8 years ago

Joaquin, can you put a drawing of the database schema you are using for storing the experiments in OpenML in this thread?

joaquinvanschoren commented 8 years ago

Here you go. Basically, it has: Run = Setup + Task Setup = Implementation + list of parameter settings Task = Input dataset(s) + train/test folds + task-specific info (e.g. target attributes) Runs can have multiple outputs, e.g. Evaluations, Predictions, Models, ...

expdbschema2

joaquinvanschoren commented 8 years ago

Hmm, I saw that the image I posted above redirects to my whole presentation now. The main figure I meant to highlight is this one:

screen shot 2015-10-29 at 09 29 01
mommi84 commented 8 years ago

In the Vocabulary wiki page, a table is now ready to be filled out. I just figured out that the Markdown language does support tables, although the syntax is a bit messy.

With respect to the discussion, the most top-level framework of concepts and properties (which are equally important) I am aware of are the PROV-O starting terms. My suggestion would be to map the core concepts to them, so that we rely on a solid standard upper-level schema. E.g., the following mapping would highlight three top concepts. Feel free to object!

mls:Algorithm rdfs:subClassOf prov:Entity .
mls:Task rdfs:subClassOf prov:Activity .
mls:Operator rdfs:subClassOf prov:Agent .

PROV-O Starting Terms

agnieszkalawrynowicz commented 8 years ago

Dear Tommaso,

Thank you! With regard to the mapping to PROV. I agree that it is good to map to PROV. But I object to do this as for now. My idea is to create a core ML vocabulary. Each of us uses various mappings to top-level vocabularies/ontologies: PROV, BFO, Dolce. I hope we are not going to start from trying to align those. Let's concentrate on defining core ML domain concepts as our top.

Cheers, Agnieszka

Dnia 2 lis 2015 o godz. 00:26 Tommaso Soru notifications@github.com napisał(a):

In the Vocabulary wiki page, a table is now ready to be filled out. I just figured out that the Markdown language does support tables, although the syntax is a bit messy.

With respect to the discussion, the most top-level framework of concepts and properties (which are equally important) I am aware of are the PROV-O starting terms. My suggestion would be to map the core concepts to them, so that we rely on a solid standard upper-level schema. E.g., the following mapping would highlight three top concepts. Feel free to object!

mls:Algorithm rdfs:subClassOf prov:Entity . mls:Task rdfs:subClassOf prov:Activity . mls:Operator rdfs:subClassOf prov:Agent .

— Reply to this email directly or view it on GitHub.

diegoesteves commented 8 years ago

It's just a matter or time I would say. Both tasks are strongly correlated in order to achieve the desired model. Let's then discuss it in a different issue #10 later on, with more technical perspective and argumentations and keep this one for alignment of philosophies. Thanks for the https://github.com/ML-Schema/core/wiki/Vocabulary Tommaso, lets fill in as soon as possible.

diegoesteves commented 8 years ago

One question: who has created the "Term" values? I have created a column for (shortly) describing these terms in order to minimise the interpretation gap. (1) What should be a "task"? a description of an execution/process? (2) "operator?" is related (DMPO column) with "AlgorithmImplementation"?. Would it be "weka", "spss", "sas", "libsvm", etc? Although is more likely that yes, I can not presume that, once the term "operator" has not trivial interpretation. (3) "Operation?" has as comment "execution", however there is also "execution" as label for "Process?".

diegoesteves commented 8 years ago

Moreover, Joaquin do you have a more technical diagram of this diagram? https://cloud.githubusercontent.com/assets/1724503/10813575/709a8c1a-7e20-11e5-8a01-789dc7bbde94.png

mommi84 commented 8 years ago

I am currently rearranging the big table differently to read it more clearly. Each term (i.e., vocabulary entry, or ontology entity) will have its own table. Please do not edit it.

joaquinvanschoren commented 8 years ago

@dnes85 Not in a single diagram, but you can more technical diagrams of the underlying ontologies and exposé in this presentation: DMOJamboree.pdf

agnieszkalawrynowicz commented 8 years ago

Yes, I agree it is the matter of time. I feel that agreeing first on the core domain concepts is easier. An it is better to plan the alignment of philosophies for the next step:-)

Agnieszka

Wiadomość napisana przez Diego Esteves notifications@github.com w dniu 2 lis 2015, o godz. 15:04:

It's just a matter or time I would say. Both tasks are strongly correlated in order to achieve the desired model. Let's then discuss it in a different issue #10 later on, with more technical perspective and argumentations and keep this one for alignment of philosophies. Thanks for the https://github.com/ML-Schema/core/wiki/Vocabulary Tommaso, lets fill in as soon as possible.

— Reply to this email directly or view it on GitHub.

diegoesteves commented 8 years ago

absolutely @agnieszkalawrynowicz! thanks @joaquinvanschoren

mommi84 commented 8 years ago

You may now edit the tables in the Vocabulary page.

agnieszkalawrynowicz commented 8 years ago

Great, thanks!

Wiadomość napisana przez Tommaso Soru notifications@github.com w dniu 2 lis 2015, o godz. 17:38:

You may now edit the tables in the Vocabulary page.

— Reply to this email directly or view it on GitHub.

agnieszkalawrynowicz commented 8 years ago

Hi All, I would like to add textual definitions to the classes (types) as they are understood from the perspective of DMOP. I guess, e.g. OntoDM may use slightly different definitions etc. It would be good to first see how we define those concepts to make informed choices in the next steps of the specification of the core vocabulary. Currently, we have tables with properties and values (ontology/vocabulary name and the name of the class, respectively). Maybe those tables could have a third column? But I am not sure it will go OK with the meaning of those columns? They are going to be processed automatically into some form of triples?

Examples of textual definitions from DMOP: DM-Task: "DM-Task: A task in general is any piece of work that is undertaken or attempted [SUMO]. A DM-Task is any task that needs to be addressed in the data mining process. DMOP's DM-Task hierarchy models all the major task classes." or DM-Algorithm: "DM-Algorithm: An algorithm in general is a well defined sequence of steps that specifies how to solve a problem or perform a task. It typically accepts an input and produces an output. A DM algorithm is an algorithm that has been designed to perform any of the DM tasks, such as feature selection, missing value imputation, or modeling (or induction). The higher-level classes of the DM-Algorithm hierarchy correspond to DM-Task types. Immediately below are broad algorithm families or what data miners more commonly call paradigms or approaches. The Algorithm hierarchy bottoms out in individual algorithms such as CART, Lasso or ReliefF. A particular case of a DM algorithm is a Modeling (or Learning) algorithm, which is a well-defined procedure that takes data as input and produces output in the form of models or patterns. "

Cheers, Agnieszka

agnieszkalawrynowicz commented 8 years ago

@dnes85

„Task" is a data mining task e.g. classification or pattern discovery. Here is the top hierarchy of data mining tasks from DMOP:

DM-Task CoreDM-Task DataProcessingTask HypothesisApplicationTask HypothesisEvaluationTask HypothesisProcessingTask InductionTask ModelingTask DescriptiveModelingTask ClusteringModelingTask DependencyModelingTask ProbabilityEstimationTask PredictiveModelingTask ClassificationModelingTask RegressionModelingTask StructuredPredicionModelingTask PatternDiscoveryTask AssociationDiscoveryTask DeviationDetectionTask DissociationDiscoveryTask SubgroupDiscoveryTask

„Operator” is an alternative name for „AlgorithmImplementation”. E.g. RapidMiner uses this term and we also use the name „DM-Operator” in DMOP for denoting algorithm implementations. It would be, for instance, „ID3" in RapidMiner (http://docs.rapidminer.com/studio/operators/modeling/classification_and_regression/tree_induction/id3.html) or „id3" in Weka (http://weka.sourceforge.net/doc.stable/weka/classifiers/trees/Id3.html) or "Rule Induction” in RapidMiner (http://docs.rapidminer.com/studio/operators/modeling/classification_and_regression/rule_induction/rule_induction.html) which actually implements the algorithm RIPPER.

„Operation” is an execution of the operator (or algorithm implementation).

„Process” is an execution of a „worfklow”.

I would like to put these definitions into the Wiki but I am not sure if I am supposed to create a third column or we have a better idea on editing it.

Wiadomość napisana przez Diego Esteves notifications@github.com w dniu 2 lis 2015, o godz. 15:31:

One question: who has created the "Term" values? I have created a column for (shortly) describing these terms in order to minimise the interpretation gap. (1) What should be a "task"? a description of an execution/process? (2) "operator?" is related (DMPO column) with "AlgorithmImplementation"?. Would it be "weka", "spss", "sas", "libsvm", etc? Although is more likely that yes, I can not presume that, once the term "operator" has not trivial interpretation. (3) "Operation?" has as comment "execution", however there is also "execution" as label for "Process?".

— Reply to this email directly or view it on GitHub.

agnieszkalawrynowicz commented 8 years ago

Dear All,

What about discussing during today’s call the topic we left out last time: Propose top-level concepts #2 ? If this is a good idea then I would prepare and send out an agenda with this topic.

I propose to extend the tables by the third column with a textual definition of the proposed top-concepts (it could help to align the understanding of the meaning of those concepts and also to define properties/attributes in the next steps). What do you think?

Regards and cheers, Agnieszka

joaquinvanschoren commented 8 years ago

Dear all,

Could everybody (especially OntoDM and MEX) please fill in the table in the Wiki before the meeting at 13:30? https://github.com/ML-Schema/core/wiki/Vocabulary

Feel free to add your ontology/vocabulary if it is not mentioned.

I added some notes because it was not always easy to understand the definition of the top-level concepts.

Cheers, Joaquin

On Mon, Nov 9, 2015 at 9:38 AM agnieszkalawrynowicz < notifications@github.com> wrote:

Dear All,

What about discussing during today’s call the topic we left out last time: Propose top-level concepts #2 ? If this is a good idea then I would prepare and send out an agenda with this topic.

I propose to extend the tables by the third column with a textual definition of the proposed top-concepts (it could help to align the understanding of the meaning of those concepts and also to define properties/attributes in the next steps). What do you think?

Regards and cheers, Agnieszka

— Reply to this email directly or view it on GitHub https://github.com/ML-Schema/core/issues/2#issuecomment-154997350.

mommi84 commented 8 years ago

[Part of message moved to #10]

@agnieszkalawrynowicz In your schema diagram, I guess DM-Operator executes DM-Operation, and not the other way round. Am I right?

agnieszkalawrynowicz commented 8 years ago

@mommi84 I can understand what you mean (while thinking on the DM-Operator as an agent).

No, DM-Operator does not execute DM-Operation. DMOP uses these meanings of "execute":

  1. to carry out: to execute a plan.
  2. to perform: to execute a gymnastic feat. DM-Operator (or DM-Workflow) is a plan that is executed by a process (DM-Operation).

It made me to think on the PROV-O. PROV-O is in principle on the level of execution, so it is supposed to store the information on what was executed, by whom etc.? In this case, Task, Algorithm are on the specification level (on the level of the plan)?

agnieszkalawrynowicz commented 8 years ago

I see in thefreedictionary there is also such meaning:

  1. Computers To run (a program or instruction).
agnieszkalawrynowicz commented 8 years ago

And it is the program (DM-Operator, Implementation) that is being executed in DMOP's view. But whether this is executed by a DM-Operation (a process) this is not so sure:-) For the process (DM-Operation, DM-Process), the mapping to prov:Activity seems to make more sense.

agnieszkalawrynowicz commented 8 years ago

I have had a look into MEX and it seems DM-Operator in DMOP corresponds to MEX's Implementation (thus PROV:Entity), so that should be right.

joaquinvanschoren commented 8 years ago

Dear all,

I am sorry, but I won't be able to join today's call. I have another meeting that I cannot move. Agnieszka, could you please chair the call?

Cheers, Joaquin

On Mon, Dec 7, 2015 at 11:08 AM agnieszkalawrynowicz < notifications@github.com> wrote:

I have had a look into MEX and it seems DM-Operator in DMOP corresponds to MEX's Implementation (thus PROV:Entity), so that should be right.

— Reply to this email directly or view it on GitHub https://github.com/ML-Schema/core/issues/2#issuecomment-162470306.

diegoesteves commented 8 years ago

As discussed MEX Implementation points out to the Software Implementation (e.g.: Weka, Octave, DL-Learner, scikit-learn, Rapid Miner, ...). We reach the algorithm implementation by combining the generic "mexalgo:Algorithm" class and "mexalgo:Implementation".

diegoesteves commented 8 years ago

http://mex.aksw.org/mex-algo#SupportVectorMachines + http://mex.aksw.org/mex-algo#Weka means that is a run about SVM on Weka. We do NOT overspecialize (More classes are available for detailing more the algorithm, in that case the kernel: Linear, RBF, ...)

diegoesteves commented 8 years ago

using the same logic, http://mex.aksw.org/mex-algo#SupportVectorMachines + http://mex.aksw.org/mex-algo#scikit-learn means that the run has been performed using SVM on the scikit-learn Framework.