EnvironmentOntology / envo

A community-driven ontology for the representation of environments
http://www.environmentontology.org
Creative Commons Zero v1.0 Universal
135 stars 52 forks source link

Create annotation guideines #69

Closed GoogleCodeExporter closed 2 years ago

GoogleCodeExporter commented 9 years ago
(Based on comments from anonymous reviewer)

We should have some annotation guidelines on the wiki or google code site. This 
may be two documents

 * Curator guidelines (e.g. minimal information)
 * Data exchange format guidelines (my preference would be JSON-LD/RDF)

Original issue reported on code.google.com by cmung...@gmail.com on 10 Aug 2013 at 6:59

GoogleCodeExporter commented 9 years ago
Draft online at http://www.environmentontology.org/annotation-guidelines

Original comment by p.buttig...@gmail.com on 2 Sep 2013 at 12:31

GoogleCodeExporter commented 9 years ago
Fantastic!

A few quick comments

- I think we can go ahead and defined some relations and/or OWL models for each 
of these.

- Biome: Should we be thinking about different kinds of annotation/association? 
For example, in recording where a sample was taken from we might want to avoid 
any strong conclusions such as "this organism is adapted to this biome" (it may 
be an organism that got lost or was displaced by climate change). We just want 
to say the collection is located_in (RO:0001025) the biome I think. In 
contrast, those wishing to record encylopaedic knowledge, e.g. EOL, will want a 
stronger relation that does not shy away from talk of evolved adaptations. It 
may be the case that part_of is the stronger relation we want to use here (need 
to read the draft paper again).

- Feature: is some kind of vicinity_of relation sufficient here? It seems there 
may be a spectrum - on the one hand, the camel may just happen to have been 
near the oasis when the observation was made. Or the oasis may be integral to 
the existence of the camel. Or it may be an integral part of the biome to which 
the camel is part of, but not so integral to the camel's exitence.

- Material: I think surrounded_by might work here. We may have sub-relations 
for partial surroundedness vs complete. E.g. a swimming duck is surrounded 
ventrally by water and dorsally by air. Do we care specifically whether the 
material is displaced? E.g. a microbe that lives in the interstices between 
grains of sand. It doesn't have the power to displace these, but it is 
surrounded by them.

For all these we have to think about the temporal context of the annotation. 
Again, for many observations we are capturing a snapshot, so it is what we are 
seeing at that time. For knowledge capture we need to encode the fact that this 
is a *typical* arrangement - e.g. the duck has a disposition to be surrounded 
by air and freshwater (by virtue of it's disposition to swim in rivers and 
lakes), but not every duck is surrounded by both air and water at all times.

Original comment by cmung...@gmail.com on 3 Sep 2013 at 3:27

GoogleCodeExporter commented 9 years ago
I like the idea of a range of relations and a distinction between knowledge 
modelling and instance-level reporting.

- Biome: If the "system" view prevails, we could consider a relation like 
component_of. On the instance level, even if the entity in question is there 
accidentally, it is still a component_of the system (i.e. has some causal 
influence on it). For now, the located_in / proper_part_of, especially as 
parthood implies a located_in relation sounds like a good start. However, in 
similar thinking to the component_of relation, can't an entity be part of a 
biome even though it didn't evolve adaptations to it? Perhaps we can suggest 
integral_part_of for biome-specific organisms (e.g. conifer integral_part_of 
coniferous forest biome)? If we include an adapted_to relation (which would be 
automatic for any living entity that's an integral_part_of a biome), we could 
then allow annotation of entities that may not have adapted to particular 
biome, but which (via convergence etc) still have adaptations to its ecological 
demands. If they don't have integral_part_of relations to the biome, it's 
understood that they didn't evolve there.

- Feature: I'd say vicinity_of is sufficient for now. Currently, feature 
subclasses are intended to highlight a material entity that has some sort of 
disproportionately large causal influence on the surrounding environment (which 
includes that feature). Thus the feature can be said to "determine" the 
environmental system around it. Relations between feature and an entity of 
interest could be like vicinity_of. This, then, doesn't necessitate that the 
entity of interest is disproportionately influenced by the feature, but records 
that the influence of that feature is a determining influence in the entity's 
environment. It would have to clear that proximity implies that the feature 
somehow has a determining influence on the environment of the entity of 
interest, whether or not that entity responds to it strongly. Does that sound 
reasonable?

- Material: Good point regarding the microbe and sand grains. One could argue 
(and we're handling this in the alignment of ENVO's top-levels) that the 
environmental material "sand" is more than just an aggregate of grains of sand 
and includes the interstices. This is similar to soil including air spaces that 
permeate it or air including suspended particles. In this sense, the microbe is 
still displacing some of what is the environmental material, sand. I think 
using surrounded_by as an interim relation will be useful though. I also feel 
the partial relations will be important. They would work regardless of whether 
we're using displaces or surrounded_by.

Making the typical scenario (or what the entity in question is disposed to be 
surrounded_by, a component_of, or vicinity_of) would bridge ENVO and, e.g., 
organismal taxonomy and also have the potential to set a baseline to detect 
'unusual' instances. That would be a very interesting direction to go in.

Original comment by p.buttig...@gmail.com on 4 Sep 2013 at 11:07

GoogleCodeExporter commented 9 years ago
integral_part_of is a bit problematic - in RO2005 it's a type-level relation 
only. The equivalent in OWL is a reciprocal part_of-some and has_part-some 
pair. I don't think this is quite right

I would use a more specific relation that component_of, which is already used 
in RO. Or it may be fine to use part_of here.

I hadn't thought of sand = sand grains plus interstices, but this seems like a 
good solution and also the correct way to model it. Proof: the volume of a 
portion of sand would intuitively be the sum of the volumes of all interstices 
plus all grains - not just the sum of grains. So it's still fine to use some 
kind of surroundedness relation (bearing in mind complexities of partial 
surroundment), but we could have the additional axiom that this entails 
displacement.

Original comment by cmung...@gmail.com on 6 Sep 2013 at 11:56

GoogleCodeExporter commented 9 years ago

Original comment by cmung...@gmail.com on 18 Feb 2015 at 1:46

GoogleCodeExporter commented 9 years ago
I think it would also be useful to have guidelines specific to the metagenomics 
community. I love the camel in the oasis example on 
http://www.environmentontology.org/annotation-guidelines but this is confusing 
(I would think?) if coming from a sample perspective. Where is the actual 
sample in this example? Is it a chunk of fur from the camel, or a handful of 
sand in an oasis at which camels hang out?

We should also coordinate with groups like MiXS to have linked e.g. from here
http://gensc.org/projects/mixs-gsc-project/

Original comment by cmung...@gmail.com on 18 Feb 2015 at 2:04

pbuttigieg commented 2 years ago

Done in wiki and site.