EMODnet / esas2obis

Darwin Core mapping of ESAS data for publication to OBIS
MIT License
0 stars 0 forks source link

Can we give a `coordinateUncertaintyInMeters`? #5

Closed peterdesmet closed 1 year ago

peterdesmet commented 2 years ago

See conclusion below

peterdesmet commented 2 years ago

@nicolasvanermen is it possible to give an estimate of the precision of the coordinates in the ESAS data?

nicolasvanermen commented 2 years ago

We always aim for at least 4 decimals for longitude and latitude values.

On Sat, Mar 19, 2022 at 9:43 AM Peter Desmet @.***> wrote:

@nicolasvanermen https://github.com/nicolasvanermen is it possible to give an estimate of the precision of the coordinates in the ESAS data?

— Reply to this email directly, view it on GitHub https://github.com/inbo/esas2obis/issues/5#issuecomment-1072970165, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYKGXMH5Z4E4BGZNXKUDJDVAWHRPANCNFSM5Q4C4GSA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

peterdesmet commented 2 years ago

Values range from 1 decimal to 10 decimals.

peterdesmet commented 1 year ago

It is rather difficult to calculate the coordinateUncertainty for this dataset. It depends upon:

So, the furthest point for an observation from the recorded coordinates is:

sqrt( (30m + 16m + distance travelled)2 + max observationDistance in that event2)

@rubenpp7 @pieterprovoost @tucotuco thoughts? Should I attempt to calculate this in the query?

pieterprovoost commented 1 year ago

distanceTravelled and observationDistance are working perpendicularly while the others work in all directions, so maybe like this? Not sure.

sqrt(distanceTravelled^2 + observationDistance^2) + 30 + 16

If it's too difficult to calculate for individual observations then I suggest coming up with a conservative estimate that works for the entire dataset.

tucotuco commented 1 year ago

@pieterprovoost is correct in his calculation.

Since observationDistance is in the child observation events it seems to me that you have no choice except to look at them, even to get a maximum to apply at the dataset level, so you might as well do the calculations for the observations and thereby let them be as good as they can be.

peterdesmet commented 1 year ago

@tucotuco, I understand. I see different ways to implement this if we want to add this info for each occurrence:

  1. Add coordinateUncertaintyInMeters in the occurrence core, without coordinates (these are already in the event). Not sure how this would be processed
  2. Add coordinateUncertaintyInMeters in the occurrence core and repeat coordinates.
  3. Create child events in the Event core for each occurrence, add coordinateUncertaintyInMeters and repeat coordinates from their direct parents.

However, I would probably opt to only add the maximum coordinateUncertaintyInMeters at the parent events:

  1. Clarity: keeping the geographic information with the events (rather than also the occurrences) makes the format clearer, because it follows the original data format more closely:
original DwC comments
campaign Event of type cruise date range
sample  Event of type sample single date
position Event of type subsample single timestamp + coordinates
observation Occurrence (extension) no date, time, coordinates
  1. Variability: most observations within their parent event have the same observationDistance and would thus have the same coordinateUncertaintyInMeters => ok to keep coordinateUncertaintyInMeters with parent
  2. Influence: the observationDistance(median 200m) is typically smaller than the distanceTravelled (median 933m) and thus has less influence on the variability => ok keep coordinateUncertaintyInMeters with parent.
tucotuco commented 1 year ago

@peterdesmet Though there are different ways one could implement it, I would follow the DwC principle to fill in everything you can. I think this would mean having the parent Event of a set of Occurrences with the maximum coordinateUncertaintyInMeters as you suggested, plus full specific georeferences for each Occurrence. That way Occurrences on their own are complete, autonomous, and as specific as they should be. What would doing so really cost anyone?

peterdesmet commented 1 year ago

I discussed the uncertainty with @nicolasvanermen who is very familiar with the data. His comment (paraphrased):

The data are so diverse (many partners, long history) that trying to express the uncertainty in a single formula makes little sense. But if we have to make an attempt:

  • Coordinate precision (especially in the early years where it is very low) is unknown
  • ObservationDistance is only known for birds within a transect, outside it, it can be anything from 300m to 5000m, so I would not try to assess uncertainty at an observation level
  • Distance the ship travelled (at position level) is the only value that makes sense to calculate uncertainty. It may be high at times, but it is known and unambiguous.

Given that, I see the approaches (note that the average Distance is 894m).

  1. coordinateUncertaintyInMeters = Distance. Since we don't know the direction of the ship, it is the radius in all directions. Advantage: simple approach and easy to communicate.
  2. coordinateUncertaintyInMeters = Distance + 30m. We assume coordinates are taken by GPS after ~2020-05-01~ 2000-05-01. This is a huge stretch (most data are before that date and it is unknown if GPS was used), but sounds reasonable for future records.
  3. coordinateUncertaintyInMeters = Distance + 157m. Even though the coordinate precision is unknown, we assume it to be 3 decimals for all coordinates (the average number of decimals in the current data is 3.5 for lat/long combined, see plot below) which at the equator (highest uncertainty) translates in 157m.
  4. A combination of 2 and 3.
  5. A combination based on the date (e.g. before x do this), but this is also all based on imprecise assumptions.

@tucotuco @nicolasvanermen what would you recommend?

Rplot

nicolasvanermen commented 1 year ago

My personal opinion: the ESAS database is raw data, while this is part of data analysis and interpretation. So either the precision field remains blank, or it is filled with Distance values.

peterdesmet commented 1 year ago

I would be fine with using Distance values (the only reliable source for uncertainty here) only. @tucotuco?

peterdesmet commented 1 year ago

Discussed via chat with @tucotuco.

Using uncertainty = distance is incorrect, because we know the uncertainty is larger than the distance. As discussed above, the following elements contribute to the uncertainty:

Given the many unknowns, we decided not to provide a coordinateUncertaintyInMeters and indicate this as such in georeferenceRemarks:

coordinate uncertainty unknown, see https://github.com/inbo/esas2obis/issues/5