AAFC-BICoE / dina-planning

AAFC-DINA planning repository
3 stars 2 forks source link

Define Location #66

Open cgendreau opened 3 years ago

cgendreau commented 3 years ago

Location in the sense of a geographical place that can be linked to a Collecting Event.

A Location could be a country, province, national park, water body or a custom site.

It has the following properties:

A Collecting Event can have more precise coordinates within the Location.

heathercole commented 3 years ago

I would suggest that "location" is a combination of locality and "place names"/geography

For biological specimens, "location" or "locality" is often a text string describing the specific location where a specimen was collected (may be relevant to name this "verbatim location" or "locality/verbatim locality (DarwinCore)". This text can be broad "Algonquin Park" or specific, 20 meters north of Highway 11 at milemarker 29. There are different ways that this would then be related to "geography". "Country" is vital, "Province/Territory/State" divisions are also necessary. There are some grey areas regarding municipal or other within-province level divisions, but a final (3rd) 'lowest resolution' option is often "nearest named place". This "named place" should have information that matches the properties described above, but it should be noted, that in many cases it may not be appropriate to assign those properties. (for example, the properties above for a specimen collected in "Ottawa" in 1923 may be quite different (eg. centre/bounding box) than the properties for those that exist in 2020. Even province lines can change (eg. creation of Nunavut in 1999).

Of note "nearest name" could exist at all 3 scales, depending on information on a label eg. "15 km west of Ottawa", nearest name = Ottawa, Ontario, Canada eg. Algonquin Park; nearest name = Algonquin Provincial Park, Ontario, Canada eg. Mer Bleu, Canada = this is a problem, there are many! would need to investigate to see if possible to determine whether in Ontario or Quebec.

Ideally, each 'place name' would exist in a hiearchy/database/tree, so users can identify "Ottawa", then the system knows that "Ottawa" is in Ontario. However, there may also be the need to have missing information. Eg. "Mer Bleu" is in Ontario, but also Quebec, so if the user isn't sure which, there should just be the option for "Mer Bleu" in Canada. I've attached a view of a Specify Geography tree that shows nearest names associated to regions, but also just to Province (if region unknown/not provided). **Not sure it makes sense to include the 'region' as a structured part of 'geography', this is perhaps more relevant left to the 'verbatim locality'

**Also of note, that 1 place name could have different properties. DAO has many collections from the "Mackenzie Mountains" which is the text provided, but then different specimens have different lat/long/elevation associated to them.

A place-name should be found in a gazetteer, but a locality would not.

Specify treats Localities as objects with the coordinates associated to the locality (not geography), this is relevant, because for a city like "Toronto" you wouldn't want all Toronto localities to have the same GPS coordinates if it was possible to be more specific (eg. provided by collector). However, this can create a problem with locality name "duplicates" (as noted above with the "Mackenzie Mountains example"

Here are 2 Specify Screenshots. Specify locality table Specify geography tree

I will consult further with the collections managers as to what their requirements are for these fields.

cgendreau commented 3 years ago

The fact that borders are moving is the main argument to not store them as tree but more a list of snapshots in time a places of interest with a link to a source (as described above + temporal information). You still have the hierarchy information but you would not have the geography tree as above.

heathercole commented 3 years ago

the difference between hierarchy and tree is not clear. I don't think that the occasional creation of a new province/territory is a good enough reason, on its own, to not have a tree. I understand that there are many ways of provided needed functionality. I don't think these requirements have been gathered yet.

Other "use-cases" relating to location/geography are the ability to merge duplicates, but also have duplicate place names exist in more than one instance (like Mer Bleue example above), or multiple georeferences for the same place-name.

cgendreau commented 3 years ago

There is plenty of external sources maintaining this type of data and I find it hard to justify that a collection management software should also do it. It should leverage existing infrastructure instead of trying to duplicate it.

For the records, a tree would store links to parent/child:

vs hierarchy information would simply keep the information about an external source at a point in time (level 1: Canada, level 2: Ontario, level 3: Ottawa, source: OpenStreetMap, date: 2020-12-22, BoudingBox: ...).

cgendreau commented 3 years ago

But like you said let's wait for requirements confirmation (since we already have some of them but they should be validated).

The important point is to dig into the why something is required/needed. The tree is one solution to a problem, what is the problem it is solving? We can continue the discussion once we know that :) .

rintoult commented 3 years ago

my only comment here was what was discusse during the last meeting about collection events, where I mentioned that for many things in our collection we need more granularity than was discussed , ie Ottawa isn't enough but Franklin Farm creekside might be

heathercole commented 3 years ago

Homework! thanks for your patience. For this requirement, i started by providing managers a list of related darwin core fields and asked whether they needed all the fields or whether there are any missing. Moving forward, I am not sure that is really what you need, so may need to better structure what information WP3 needs, and how best to provide that feedback.

Here is what we did compile, I am more than happy to discuss/re-consult as necessary. Data fields I provided location.fields.for.review.xlsx

general notes; need to support multiple formats for lat/long, also support representing lines and boxes (not just points) All collections assume that there is specific location information that is (more or less) unique to a collection record, then that level of info/data is linked to structured geography/gazetteer/geographical hierchy/tree/

Perhaps additional notes on georeferencing are needed, in most cases, geoferences are related to the specimen level/specific location info, not the higher geography (one lat/long to represent all collection events linked to "Ottawa" would not be appropriate). BUT being able to use a centroid lat/long for those Ottawa specimens which don't have any lat/long would be very valuable for mapping, we would just need to be able to clearly indicate whether the lat/long associated to the record is from the specimen vs. higher geography. This is a well documented/discussed issue in the community.

Feedback from Ginco/CCAMF/Claudia location.fields.CCAMFreview.xlsx

Feedback from CCFC/Tara "I did not include most of the "water" fields and for the current collection we seldom have that specific information on fungi that may have been collected from water. This of course would change if bacteria get added to this collection since often they are collected from streams etc.

We also don't currently have any georeferencing efforts planned, I am still not sure how often that type of work is applied to living collections, but I know others will want them so no worries if I don't ask for them and want them later." location.fields.for.review_CCFCresponse.xlsx

Feedback from DAO/DAOM/Shannon location.fields.for.review_DAO_DAOM.xlsx

Feedback from CPVC/Ron "Most of this is well above the level that we currently keep, but I do see the value in it." location.fields.for.review.RR.xlsx

Feedback from CNC/Michelle :pending, will provide, but as noted above, maybe need to take this in another direction/discussion.

let me know, still learning how to structure these communications most effectively (from our side, we more or less think that this detailed review of fields is NOT the best and will move away from it - unless you tell us it is wonderful and the most useful things you have ever seen).