Closed peterdesmet closed 1 year ago
@nicolasvanermen is it possible to give an estimate of the precision of the coordinates in the ESAS data?
We always aim for at least 4 decimals for longitude and latitude values.
On Sat, Mar 19, 2022 at 9:43 AM Peter Desmet @.***> wrote:
@nicolasvanermen https://github.com/nicolasvanermen is it possible to give an estimate of the precision of the coordinates in the ESAS data?
— Reply to this email directly, view it on GitHub https://github.com/inbo/esas2obis/issues/5#issuecomment-1072970165, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYKGXMH5Z4E4BGZNXKUDJDVAWHRPANCNFSM5Q4C4GSA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
Values range from 1 decimal to 10 decimals.
2.9
has a precision of 2.9000
?It is rather difficult to calculate the coordinateUncertainty for this dataset. It depends upon:
So, the furthest point for an observation from the recorded coordinates is:
sqrt( (30m + 16m + distance travelled)2 + max observationDistance in that event2)
@rubenpp7 @pieterprovoost @tucotuco thoughts? Should I attempt to calculate this in the query?
distanceTravelled and observationDistance are working perpendicularly while the others work in all directions, so maybe like this? Not sure.
sqrt(distanceTravelled^2 + observationDistance^2) + 30 + 16
If it's too difficult to calculate for individual observations then I suggest coming up with a conservative estimate that works for the entire dataset.
@pieterprovoost is correct in his calculation.
Since observationDistance is in the child observation events it seems to me that you have no choice except to look at them, even to get a maximum to apply at the dataset level, so you might as well do the calculations for the observations and thereby let them be as good as they can be.
@tucotuco, I understand. I see different ways to implement this if we want to add this info for each occurrence:
coordinateUncertaintyInMeters
in the occurrence core, without coordinates (these are already in the event). Not sure how this would be processedcoordinateUncertaintyInMeters
in the occurrence core and repeat coordinates.coordinateUncertaintyInMeters
and repeat coordinates from their direct parents.However, I would probably opt to only add the maximum coordinateUncertaintyInMeters
at the parent events:
original | DwC | comments |
---|---|---|
campaign | Event of type cruise |
date range |
sample | Event of type sample |
single date |
position | Event of type subsample |
single timestamp + coordinates |
observation | Occurrence (extension) | no date, time, coordinates |
observationDistance
and would thus have the same coordinateUncertaintyInMeters
=> ok to keep coordinateUncertaintyInMeters
with parentobservationDistance
(median 200m) is typically smaller than the distanceTravelled
(median 933m) and thus has less influence on the variability => ok keep coordinateUncertaintyInMeters
with parent.@peterdesmet Though there are different ways one could implement it, I would follow the DwC principle to fill in everything you can. I think this would mean having the parent Event of a set of Occurrences with the maximum coordinateUncertaintyInMeters
as you suggested, plus full specific georeferences for each Occurrence. That way Occurrences on their own are complete, autonomous, and as specific as they should be. What would doing so really cost anyone?
I discussed the uncertainty with @nicolasvanermen who is very familiar with the data. His comment (paraphrased):
The data are so diverse (many partners, long history) that trying to express the uncertainty in a single formula makes little sense. But if we have to make an attempt:
- Coordinate precision (especially in the early years where it is very low) is unknown
ObservationDistance
is only known for birds within a transect, outside it, it can be anything from 300m to 5000m, so I would not try to assess uncertainty at an observation levelDistance
the ship travelled (at position level) is the only value that makes sense to calculate uncertainty. It may be high at times, but it is known and unambiguous.
Given that, I see the approaches (note that the average Distance is 894m
).
coordinateUncertaintyInMeters = Distance
. Since we don't know the direction of the ship, it is the radius in all directions. Advantage: simple approach and easy to communicate.coordinateUncertaintyInMeters = Distance + 30m
. We assume coordinates are taken by GPS after ~2020-05-01~ 2000-05-01. This is a huge stretch (most data are before that date and it is unknown if GPS was used), but sounds reasonable for future records.coordinateUncertaintyInMeters = Distance + 157m
. Even though the coordinate precision is unknown, we assume it to be 3 decimals for all coordinates (the average number of decimals in the current data is 3.5 for lat/long combined, see plot below) which at the equator (highest uncertainty) translates in 157m.@tucotuco @nicolasvanermen what would you recommend?
My personal opinion: the ESAS database is raw data, while this is part of data analysis and interpretation. So either the precision field remains blank, or it is filled with Distance values.
I would be fine with using Distance values (the only reliable source for uncertainty here) only. @tucotuco?
Discussed via chat with @tucotuco.
Using uncertainty = distance
is incorrect, because we know the uncertainty is larger than the distance. As discussed above, the following elements contribute to the uncertainty:
Given the many unknowns, we decided not to provide a coordinateUncertaintyInMeters
and indicate this as such in georeferenceRemarks
:
coordinate uncertainty unknown, see https://github.com/inbo/esas2obis/issues/5
See conclusion below