Can we give a `coordinateUncertaintyInMeters`?

peterdesmet commented 2 years ago

See conclusion below

peterdesmet commented 2 years ago

@nicolasvanermen is it possible to give an estimate of the precision of the coordinates in the ESAS data?

nicolasvanermen commented 2 years ago

We always aim for at least 4 decimals for longitude and latitude values.

On Sat, Mar 19, 2022 at 9:43 AM Peter Desmet @.***> wrote:

@nicolasvanermen https://github.com/nicolasvanermen is it possible to give an estimate of the precision of the coordinates in the ESAS data?

— Reply to this email directly, view it on GitHub https://github.com/inbo/esas2obis/issues/5#issuecomment-1072970165, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYKGXMH5Z4E4BGZNXKUDJDVAWHRPANCNFSM5Q4C4GSA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

peterdesmet commented 2 years ago

Values range from 1 decimal to 10 decimals.

I am to assume that e.g. 2.9 has a precision of 2.9000?
Can 4 decimals precision be assumed for all studies?

peterdesmet commented 1 year ago

It is rather difficult to calculate the coordinateUncertainty for this dataset. It depends upon:

GPS accuracy: default 30m for GPS
Coordinate uncertainty: if we assume all coordinates have precision of 4 decimals (doubtful), then the highest uncertainty (at the equator) is an additional 16m (see https://docs.gbif.org/georeferencing-best-practices/1.0/en/#table-uncertainty)
Distance travelled by ship: the coordinates are only taken at certain snapshots, meanwhile the ship/airplane moves forward
Distance at which observation was seen, perpendicular to distance travelled: recorded in observationDistance. Unfortunately this depends on the observations, while we have the coordinates associated with a (parent) event

So, the furthest point for an observation from the recorded coordinates is:

sqrt( (30m + 16m + distance travelled)² + max observationDistance in that event²)

@rubenpp7 @pieterprovoost @tucotuco thoughts? Should I attempt to calculate this in the query?

pieterprovoost commented 1 year ago

distanceTravelled and observationDistance are working perpendicularly while the others work in all directions, so maybe like this? Not sure.

sqrt(distanceTravelled^2 + observationDistance^2) + 30 + 16

If it's too difficult to calculate for individual observations then I suggest coming up with a conservative estimate that works for the entire dataset.

tucotuco commented 1 year ago

@pieterprovoost is correct in his calculation.

Since observationDistance is in the child observation events it seems to me that you have no choice except to look at them, even to get a maximum to apply at the dataset level, so you might as well do the calculations for the observations and thereby let them be as good as they can be.

peterdesmet commented 1 year ago

@tucotuco, I understand. I see different ways to implement this if we want to add this info for each occurrence:

Add coordinateUncertaintyInMeters in the occurrence core, without coordinates (these are already in the event). Not sure how this would be processed
Add coordinateUncertaintyInMeters in the occurrence core and repeat coordinates.
Create child events in the Event core for each occurrence, add coordinateUncertaintyInMeters and repeat coordinates from their direct parents.

However, I would probably opt to only add the maximum coordinateUncertaintyInMeters at the parent events:

Clarity: keeping the geographic information with the events (rather than also the occurrences) makes the format clearer, because it follows the original data format more closely:

original	DwC	comments
campaign	Event of type `cruise`	date range
sample	Event of type `sample`	single date
position	Event of type `subsample`	single timestamp + coordinates
observation	Occurrence (extension)	no date, time, coordinates

Variability: most observations within their parent event have the same observationDistance and would thus have the same coordinateUncertaintyInMeters => ok to keep coordinateUncertaintyInMeters with parent
Influence: the observationDistance(median 200m) is typically smaller than the distanceTravelled (median 933m) and thus has less influence on the variability => ok keep coordinateUncertaintyInMeters with parent.

tucotuco commented 1 year ago

@peterdesmet Though there are different ways one could implement it, I would follow the DwC principle to fill in everything you can. I think this would mean having the parent Event of a set of Occurrences with the maximum coordinateUncertaintyInMeters as you suggested, plus full specific georeferences for each Occurrence. That way Occurrences on their own are complete, autonomous, and as specific as they should be. What would doing so really cost anyone?

peterdesmet commented 1 year ago

I discussed the uncertainty with @nicolasvanermen who is very familiar with the data. His comment (paraphrased):

The data are so diverse (many partners, long history) that trying to express the uncertainty in a single formula makes little sense. But if we have to make an attempt:

Coordinate precision (especially in the early years where it is very low) is unknown

ObservationDistance is only known for birds within a transect, outside it, it can be anything from 300m to 5000m, so I would not try to assess uncertainty at an observation level

Distance the ship travelled (at position level) is the only value that makes sense to calculate uncertainty. It may be high at times, but it is known and unambiguous.

Given that, I see the approaches (note that the average Distance is 894m).

coordinateUncertaintyInMeters = Distance. Since we don't know the direction of the ship, it is the radius in all directions. Advantage: simple approach and easy to communicate.
coordinateUncertaintyInMeters = Distance + 30m. We assume coordinates are taken by GPS after ~2020-05-01~ 2000-05-01. This is a huge stretch (most data are before that date and it is unknown if GPS was used), but sounds reasonable for future records.
coordinateUncertaintyInMeters = Distance + 157m. Even though the coordinate precision is unknown, we assume it to be 3 decimals for all coordinates (the average number of decimals in the current data is 3.5 for lat/long combined, see plot below) which at the equator (highest uncertainty) translates in 157m.
A combination of 2 and 3.
A combination based on the date (e.g. before x do this), but this is also all based on imprecise assumptions.

@tucotuco @nicolasvanermen what would you recommend?

Rplot

nicolasvanermen commented 1 year ago

My personal opinion: the ESAS database is raw data, while this is part of data analysis and interpretation. So either the precision field remains blank, or it is filled with Distance values.

peterdesmet commented 1 year ago

I would be fine with using Distance values (the only reliable source for uncertainty here) only. @tucotuco?

peterdesmet commented 1 year ago

Discussed via chat with @tucotuco.

Using uncertainty = distance is incorrect, because we know the uncertainty is larger than the distance. As discussed above, the following elements contribute to the uncertainty:

Coordinate precision: unknown and variable. Maybe 3 decimals, which would be 157m
Source of coordinates: unknown but likely GPS, so 100m before year 2000 and 30 after
Distance ship travelled: recorded in km or unknown on average 0.895km
Uncertainty associated with distance ship travelled: unknown and likely calculated in different ways. Sometimes no decimals are provided, given an uncertainty of a least 1414m (based on 1000m in one direction)
Distance of observation: recorded in m ranges or unknown, with the largest range expanding beyond 300m which given good visibility can be beyond 5000m.

Given the many unknowns, we decided not to provide a coordinateUncertaintyInMeters and indicate this as such in georeferenceRemarks:

coordinate uncertainty unknown, see https://github.com/inbo/esas2obis/issues/5

EMODnet / esas2obis

Can we give a `coordinateUncertaintyInMeters`? #5