Representational entities (data, information, knowledge, etc)

Picking up from the comment here

I motioned to tighten up some of the label-definition matches of terms like data, information, and knowledge.

Harmonised usage here is becoming important, as global initiatives such as the UN Ocean Decade Implementation Plan (see glossary, beginning p. 52) begin to get more focused on digital assets. There are also wider perspectives on these entities which need to be represented in global data systems (e.g. indigenous understandings of data etc, some brief notes here)

In the above, the realisation that calling for and funding "Big Data" is by no means a guarantee of "Big Information" or knowledge is growing.

Thus, a proposal for COB would be to have these under something like "representational entity". Roughly:

representational entity
- datum (or data)
- information
- knowledge
With definitions (for refinement) along the lines of:
representational entity: An entity which either has been deliberately created to represent another entity, or can be interpreted as a representation by some agent.
- This would rely on "represents" being defined elsewhere, perhaps in RO or COB, likely as a sub-property of "is about"
datum: a representational entity encoded in a sign, symbol, marking, or pattern on any medium.
- the plural, data, would formally go under some sort of aggregate class, and information operates as a mass noun. Not sure what to do there.
information: A datum or data which, when accurately interpreted, reduces uncertainty about the properties or behaviours of an entity.
- The "accurately interpreted" bit is how we link this to knowledge. For example, this symbol "海洋" would only be data unless you have knowledge of Japanese, in which case it informs you that an ocean is somehow involved in the situation.
- As information is a mass noun (like water or air) it could comprise a datum or data. Not sure how to handle that formally.
knowledge: A representational entity which 1) is an abstraction of an entity constructed from information about that entity, 2) grants its bearer reliable familiarity with that entity, and 3) can be used to reason about that entity.

Each of the above would have digital and non-digital forms. Looking forward to refining these.

I curious about how to handle misinformation and faulty knowledge. Misinformation would increase the degree of uncertainty

This may be a good chance to define "metadata" - data which is about other data or similar

I also see an argument that representational entities should be defined in a role-based (or capable-of-based) way. A character string on a screen is only data or information because we assign it that role. Metaphysically, it's just a bunch of illuminated pixels with colour values.

Just to get my bearings about using these classes:

A general reference to a representation would be for example "This organism produced a representation of a situation with the intention that others could interpret that representation".
With datum, one can now say "This sign is (or these signs are) somehow part of a representation". We might not know what the signs represent, as in case of the 3 languages before the rosetta stone tied them together.

Usually though a datum is interpretable - there is a stated or implicit representation parser that can generate or decode it. This sounds like that intended capability(/role) to be the input to a particular parser, e.g. a string value with 'xs:date" datatype provides the parser reference. The phrase "Bonjour, mon ami", might just be a matrix of pixels, but likely impossible to construe as anything other than french, so the parser may be implicit for us humans if there is only one possibility.

Pier: You are introducing a distinct way of looking at representation / information etc. As you know, there are many different ways of thinking about that. We had decided about a decade how to deal with this in IAO. I would strongly object to any fundamental rearrangement, unless you can point to anything that is completely broken with real life use cases.

One of the issues I see is the use of labels (as always), and that your take on what 'information' is clashes with what we have as 'information content entity'. Let's not argue about labels, that never goes anywhere. So if you do think something is fundamentally broken about our current modeling (and much of it is poorly documented and poorly implemented, so I would not be surprised), then please point that out as such, and contrast how our current entities are not sufficient.

Bjoern

On Fri, Feb 12, 2021 at 11:53 PM Damion Dooley notifications@github.com wrote:

Just to get my bearings about using these classes:

-

A general reference to a representation would be for example "This organism produced a representation of a situation with the intention that others could interpret that representation".

With datum, one can now say "This sign is (or these signs are) somehow part of a representation". We might not know what the signs represent, as in case of the 3 languages before the rosetta stone tied them together.

Usually though a datum is interpretable - there is a stated or implicit representation parser that can generate or decode it. This sounds like that intended capability(/role) to be the input to a particular parser, e.g. a string value with 'xs:date" datatype provides the parser reference. The phrase "Bonjour, mon ami", might just be a matrix of pixels, but likely impossible to construe as anything other than french, so the parser may be implicit for us humans if there is only one possibility.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/COB/issues/149#issuecomment-778579402, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2IQZEP37H2RN7Z6OA23S6YVX7ANCNFSM4XJ7PA3A .

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

In general my preference is not to commit to any one theory. An orthogonal approach here is for COB to contain a single very general class for information/ICE/RE and leave it to domain ontologies to subclass. The general entity would be uncommitted to BFO

Re the if it ain't broke dont fix it argument: many would feel the anatomy model of CARO was sufficient same for the chemistry model of chebi yet we are redefining. I think we need a more use case and less philosophy driven approach to deciding these matters

On Sat, Feb 13, 2021, 12:54 bpeters42 notifications@github.com wrote:

Pier: You are introducing a distinct way of looking at representation / information etc. As you know, there are many different ways of thinking about that. We had decided about a decade how to deal with this in IAO. I would strongly object to any fundamental rearrangement, unless you can point to anything that is completely broken with real life use cases.

One of the issues I see is the use of labels (as always), and that your take on what 'information' is clashes with what we have as 'information content entity'. Let's not argue about labels, that never goes anywhere. So if you do think something is fundamentally broken about our current modeling (and much of it is poorly documented and poorly implemented, so I would not be surprised), then please point that out as such, and contrast how our current entities are not sufficient.

Bjoern

On Fri, Feb 12, 2021 at 11:53 PM Damion Dooley notifications@github.com wrote:

Just to get my bearings about using these classes:

-

A general reference to a representation would be for example "This organism produced a representation of a situation with the intention that others could interpret that representation".

With datum, one can now say "This sign is (or these signs are) somehow part of a representation". We might not know what the signs represent, as in case of the 3 languages before the rosetta stone tied them together.

Usually though a datum is interpretable - there is a stated or implicit representation parser that can generate or decode it. This sounds like that intended capability(/role) to be the input to a particular parser, e.g. a string value with 'xs:date" datatype provides the parser reference. The phrase "Bonjour, mon ami", might just be a matrix of pixels, but likely impossible to construe as anything other than french, so the parser may be implicit for us humans if there is only one possibility.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/COB/issues/149#issuecomment-778579402, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ADJX2IQZEP37H2RN7Z6OA23S6YVX7ANCNFSM4XJ7PA3A

.

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/COB/issues/149#issuecomment-778676992, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOMKUTAHV3OSIZOJJQTS63RHBANCNFSM4XJ7PA3A .

@cmungall I don't really disagree, but this will leave us with another big integration problem down the road.

@ddooley

A general reference to a representation would be for example "This organism produced a representation of a situation with the intention that others could interpret that representation".

Could be, but the intention for others to understand it is not really needed.

With datum, one can now say "This sign is (or these signs are) somehow part of a representation". We might not know what the signs represent, as in case of the 3 languages before the rosetta stone tied them together.

Yes - that's the idea - we know its representational, but we don't know exactly what its about and so it's not informative.

Usually though a datum is interpretable - there is a stated or implicit representation parser that can generate or decode it.

I wouldn't say so - noise is still data, but parsers and decoders wouldn't really be applicable.

This sounds like that intended capability(/role) to be the input to a particular parser, e.g. a string value with 'xs:date" datatype provides the parser reference.

Could be - the role would be realised in a process involving those entities. But as @cmungall says below, we don't want to overcommit here.

The phrase "Bonjour, mon ami", might just be a matrix of pixels, but likely impossible to construe as anything other than french, so the parser may be implicit for us humans if there is only one possibility.

I wouldn't say it's likely impossible - it would be hard to not construe it as French, if you have knowledge of French, Roman lettering, the right sensory and processing systems, etc.

@bpeters42

You are introducing a distinct way of looking at representation / information etc. As you know, there are many different ways of thinking about that.

Yes - every view is distinct, including the IAO-derived views. Let's not pretend it's ex cathedra.

We had decided about a decade how to deal with this in IAO.

I'm aware of this, but all things are subject to review from a broader group. IIRC, COB is aiming for broader engagement, right?

I would strongly object to any fundamental rearrangement, unless you can point to anything that is completely broken with real life use cases.

I'm not suggesting a rearrangement of IAO, but thinking of a more intuitive and generic representation for COB, that aligns with emerging global frameworks. Also, why would something have to be "completely" broken before we attempt to improve it?

Further, don't worry, I'm not motivated to change this on a whim: This is consequential for several initiatives I'm working with, most pressingly the IOC-UNESCO Ocean InfoHub and its associated Ocean Data and Information System, as well as the Polar Semantics working group. This also pertains to some of our early work at the Helmholtz Metadata Collaborative.

One of the issues I see is the use of labels (as always), and that your take on what 'information' is clashes with what we have as 'information content entity'. Let's not argue about labels, that never goes anywhere.

I'm sorry, but the labels really matter in the digital policy application cases I'm working in (within Helmholtz, UNESCO-IOC-IODE, UN Environment, and similar). If the label suggests that something contains information (i.e. something informative) when it doesn't, it's not going to get much traction no matter how much I try to explain that the labels are "decoration" on the class. That's just not how these communities work.

So if you do think something is fundamentally broken about our current modeling (and much of it is poorly documented and poorly implemented, so I would not be surprised), then please point that out as such, and contrast how our current entities are not sufficient.

I would appreciate it if you allow me to voice the concerns I actually have, rather than attempt to suggest they're invalid and deflect from them.

Yes, I do think something is fundamentally broken with the modelling: and that's the labels and the definitions that conflict or work at crossed purposes, and that are too heavily reliant on BFO, such that no one who doesn't understand the BFO world can use them effectively.

@cmungall

In general my preference is not to commit to any one theory. An orthogonal approach here is for COB to contain a single very general class for information/ICE/RE and leave it to domain ontologies to subclass. The general entity would be uncommitted to BFO

That sounds like the most diplomatic, but I also see @jamesaoverton's point of that leading to another Babel.

Re the if it ain't broke dont fix it argument: many would feel the anatomy model of CARO was sufficient same for the chemistry model of chebi yet we are redefining. I think we need a more use case and less philosophy driven approach to deciding these matters

+1 - I don't have an issue navigating OBO-Land (and bridging its idiosyncrasies with other systems), but it's really hard to deploy this in systems run by others and convince others to use it if they haven't spent a few months (years?) understanding what's up, and that's an impossible ask for most.

And, indeed, let's not pretend OBO has more internal consistency and generalisability than it does. COB is an opportunity to open things up while preserving a reasonable foundation of interoperability.

OBOFoundry / COB

Representational entities (data, information, knowledge, etc) #149

A general reference to a representation would be for example "This organism produced a representation of a situation with the intention that others could interpret that representation".

A general reference to a representation would be for example "This organism produced a representation of a situation with the intention that others could interpret that representation".