edamontology / edamontology

EDAM is an ontology of bioscientific data analysis and data management. EDAM concentrates on topics, operations, types of data, and data formats related to analysis, modelling, optimisation, and data life-cycle in biosciences, and in other research and science-based applications.
Creative Commons Attribution Share Alike 4.0 International
117 stars 58 forks source link

Statistical / learning methods & DSEO #360

Open joncison opened 6 years ago

joncison commented 6 years ago

These are used by DSEO (https://docs.google.com/spreadsheets/d/1ix1eKplAX9KHbBsA6uU7G95_7lD5Q9jGfEL_LV7dYAQ/edit#gid=1477753083) and we could model them in EDAM:

As new EDAM topic or operation concepts? Or synonyms on existing (or perhaps some new) concepts? If at all?

And what about the existing Statistical calculation (http://edamontology.org/operation_2238) - we already have similar sorts of concepts there, and (if that pattern holds) it implies we need Learning method or some such.

What troubles me quite a bit, is that the long-established pattern in EDAM is to focus on what is done, not how. The above list - and also the Statistical calculation branch appears (at 1st glance) to break that a bit but if tools provide these as abstract methods, not tied to any particular type of data, then I guess it's fine.

Please advise what to do.

veitveit commented 6 years ago

This is indeed a tricky one. They are all relevant methods, particularly with the recent rise of machine learning. But, as you say, they are not telling what e.g. an Operation does.

The Statistical calculation branch is, to my opinion, necessary as it covers very general data analysis concepts which could work as an own (such as Standardization/normalization). Maybe that is a way to draw the line for inclusion of such terms? If the term as a single one can describe an operation, then we take it, otherwise not.

On the Topics side, we have Machine learning which covers about half of these terms and Statistics and probability which I consider a very relevant part of Computational Biology. We could move Statistics up to as general branch and then think about whether to include any relevant children. But this could be difficult as there is statistics practically everywhere but not exclusively.

joncison commented 5 years ago

I agree with your rule of thumb - which to my mind, none of these concepts quite obey.

So for now I added all these as narrowSynonym of Machine learning or Statistics and probability with a couple going under Mathematics. I think this will do, for now. We can always convert these to sub-concepts, if we wanted in the future.

cc @matuskalas FHI

bayjan commented 2 years ago

Can this issue be open again considering also the issue https://github.com/edamontology/edam-bioimaging/issues/17? Thank you.

matuskalas commented 2 years ago

Thanks for the heads-up @bayjan! Indeed, this needs to be fixed, namely in https://github.com/edamontology/edam-bioimaging/issues/17. About half of the listed terms is there as concepts, the rest has to be figured out. After they're all fixed, they will need to be merged into the mainline EDAM, something we can do with the highest priority for the ML/AI concepts.