gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

Setting default coordinateuncertaintyinmeters values for gridded datasets #2872

Open jhnwllr opened 4 years ago

jhnwllr commented 4 years ago

There are only 77 gridded datasets that do not fill in either coordinateuncertaintyinmeters or footprintwkt.

These datasets could be good candidates for default values for coordinateuncertaintyinmeters, if emailing the publishers does not yield results.

We might also consider in the future computing (filling in) a coordinateuncertaintyinmeters value from the footprintwkt. Combining these two approaches would largely eliminate gridded datasets as a data quality issue without any new vocabulary from the GBIF side.

datasetkey countNN distanceNN percentNN unique_meters unique_wkt
fe35d6a0-0c6e-11dd-84d2-b8a03c50a862 92 1 1 0 0
3a8512a0-33cf-11de-afc6-b8a03c50a862 67 0.33 1 0 0
9070a460-0c6e-11dd-84d2-b8a03c50a862 91 1 1 0 0
815fd610-f762-11e1-a439-00145eb45e9a 70 0.5 1 0 0
8226d882-f762-11e1-a439-00145eb45e9a 167 0.5 1 0 0
85ab1bf8-f762-11e1-a439-00145eb45e9a 2550 0.05 0.9996 0 0
282d0ccb-4fa0-40f9-8593-105c77e88417 1565 0.25 0.9981 0 0
906e6978-e292-4a8b-9c39-adf6bb0f3323 14843 0.08 0.9979 0 0
8ea4250e-0ff0-44f8-812e-bffc3b9ba2a4 2193 0.05 0.9977 0 0
cb429b64-d789-47bb-a7d8-379657c5e407 1589 0.04 0.9882 0 0
635f7b02-f762-11e1-a439-00145eb45e9a 5346 0.09 0.98 0 0
636308c6-f762-11e1-a439-00145eb45e9a 5264 0.09 0.9793 0 0
7f513bfc-f762-11e1-a439-00145eb45e9a 617 0.09 0.9747 0 0
63656de6-f762-11e1-a439-00145eb45e9a 4935 0.09 0.9743 0 0
635e4476-f762-11e1-a439-00145eb45e9a 4703 0.09 0.9693 0 0
7be12d06-f762-11e1-a439-00145eb45e9a 31 0.08 0.9688 0 0
f97eeccc-8409-4075-ab7c-8980f73c0d0d 944 0.05 0.9555 0 0
40d2de00-0c6e-11dd-84d2-b8a03c50a862 61 1 0.9531 0 0
6364371e-f762-11e1-a439-00145eb45e9a 3579 0.09 0.9456 0 0
488416a3-50f1-43a0-a1ce-daf1cdbf84dd 702 0.08 0.9335 0 0
77a24e14-534b-4864-bb95-ae0966ce89ce 252 0.09 0.9197 0 0
1cd0cb6d-d8ab-4e4c-9672-70d22fec96b3 919 0.08 0.9172 0 0
e0507994-aa35-4ec0-be28-92d4a3c11f75 74 0.04 0.9136 0 0
81c024c0-f762-11e1-a439-00145eb45e9a 41 0.5 0.9111 0 0
1881d048-04f9-4bc2-b7c8-931d1659a354 5806 0.25 0.8928 0 0
ffab0ec5-c5f9-4c26-9b63-28696dc5cac5 359 0.25 0.8886 0 0
478e52db-0450-47f0-a763-ad3bcdaba6d9 211 1 0.8866 0 0
82d54d68-f762-11e1-a439-00145eb45e9a 3490 0.09 0.8809 0 0
8958b49a-f762-11e1-a439-00145eb45e9a 207 0.09 0.8809 0 0
7b0153e8-f762-11e1-a439-00145eb45e9a 52 1.41 0.8667 0 0
8642bfd0-f762-11e1-a439-00145eb45e9a 1818 0.05 0.8661 0 0
863c1269-5f38-4862-a161-ea3a50654d4f 933 0.04 0.8655 0 0
89579e3e-f762-11e1-a439-00145eb45e9a 136 0.1 0.8608 0 0
82d43810-f762-11e1-a439-00145eb45e9a 3064 0.09 0.8397 0 0
0a4f3e3b-1910-445a-afa7-9b93abe69ee1 762 0.25 0.8124 0 0
bf0dd71d-364c-4f16-8612-2661f07f40a2 68 1 0.8 0 0
dbc709b9-e36e-4dd7-ab5b-c3cb08c2779d 137 0.09 0.7611 0 0
88dadc8c-f762-11e1-a439-00145eb45e9a 149 0.1 0.7563 0 0
c3b0e0ff-def0-40dd-b72d-cbe5e79c1213 252 0.05 0.7478 0 0
2aa53e94-8c53-4afc-b246-d58d5bc6b0fd 31 1 0.7381 0 0
a7793424-dc57-4d4d-ad91-9890b76ebc15 62 0.07 0.7294 0 0
e3ce628e-9683-4af7-b7a9-47eef785d3bb 6417 0.1 0.724 0 0
cd53c947-f81a-485e-92fa-cbc4aa123160 39 0.09 0.7222 0 0
84cdd6d0-f762-11e1-a439-00145eb45e9a 84 1 0.6885 0 0
82a421d4-f762-11e1-a439-00145eb45e9a 2594 0.1 0.6868 0 0
91fa1a0d-a208-40aa-8a6e-f2c0beb9b253 487 0.22 0.6802 0 0
85771146-f762-11e1-a439-00145eb45e9a 104 1 0.6753 0 0
7d8ed137-1d30-42f1-8b78-12a4957e4690 1199 1 0.6617 0 0
83459d16-f762-11e1-a439-00145eb45e9a 44 1 0.6567 0 0
867c8490-f762-11e1-a439-00145eb45e9a 151 0.07 0.6371 0 0
7b5bc5ba-4972-4876-b35a-3f9aac590fe5 81 0.09 0.6231 0 0
61a9ca38-b62f-11e2-afcb-00145eb45e9a 2118 0.1 0.6136 0 0
921ae6e9-d890-4826-b6ff-aef43cdcb409 87 0.1 0.6127 0 0
6360a554-f762-11e1-a439-00145eb45e9a 453 0.09 0.6064 0 0
81c2b870-f762-11e1-a439-00145eb45e9a 46 0.25 0.6053 0 0
81c16344-f762-11e1-a439-00145eb45e9a 46 0.25 0.6053 0 0
7b8f1304-f762-11e1-a439-00145eb45e9a 800 0.09 0.5848 0 0
faf313a1-9ae4-43f4-bfc9-974281feac0e 142 0.04 0.5613 0 0
7cc37a0e-c51a-49dd-906e-a91cc5b64f31 56 1 0.5385 0 0
083a4342-39f7-4fe0-a03c-d14ab3b9c5e2 182 0.1 0.5 0 0
53287502-980d-41a8-b9f0-0f3a5bcb6555 112 0.18 0.4978 0 0
6361dbf4-f762-11e1-a439-00145eb45e9a 196 0.09 0.4876 0 0
c90a0778-e222-406f-bcd7-ed96a2bc5f4c 716 0.25 0.4745 0 0
3e6ba6e2-88f4-48b5-b346-e8aed305f41d 149 0.09 0.4599 0 0
3331bcd4-f85e-4252-8e92-3aaa6fdc3eca 63 0.04 0.4315 0 0
8795ab31-79ae-4a9c-b285-0b76e5c09b9e 49 0.25 0.4188 0 0
86ba1954-f762-11e1-a439-00145eb45e9a 190 0.09 0.4121 0 0
857aa892-f762-11e1-a439-00145eb45e9a 2645 0.03 0.3904 0 0
a1d63dd1-5456-450c-afc8-25a62ffec50e 702 0.05 0.3744 0 0
83492a58-f762-11e1-a439-00145eb45e9a 53 1 0.3732 0 0
12ccff8a-cfa8-46f8-a8b0-c6da39cb8910 49 0.5 0.3551 0 0
8347f6ba-f762-11e1-a439-00145eb45e9a 36 1 0.3429 0 0
84c91672-f762-11e1-a439-00145eb45e9a 452 0.1 0.3199 0 0
e2032ac4-b998-47ea-a30a-2608e6a7f7f5 88 0.09 0.3188 0 0
0e8b0e10-1680-4d71-ae93-f61bd7933b1d 95 0.16 0.3156 0 0
cb456c69-d46f-47a2-a2c7-4da57c79f206 211 0.03 0.3135 0 0
891bd4ee-f762-11e1-a439-00145eb45e9a 74 0.25 0.3058 0 0
ahahn-gbif commented 4 years ago

Can you propose values based on this? Also, some more explanation on the column headers would help reading, thanks!

jhnwllr commented 4 years ago

Here is a table where I have computed half the distance to the nearest neighbor in meters.

distance_in_meters = ((distanceNN/0.01) (1.111000) ) / 2

datasetkey countNN distanceNN percentNN distance_in_meters distance_in_meters_rounded
8226d882-f762-11e1-a439-00145eb45e9a 167 0.5 1 27750 28000
fe35d6a0-0c6e-11dd-84d2-b8a03c50a862 92 1 1 55500 56000
815fd610-f762-11e1-a439-00145eb45e9a 70 0.5 1 27750 28000
9070a460-0c6e-11dd-84d2-b8a03c50a862 91 1 1 55500 56000
3a8512a0-33cf-11de-afc6-b8a03c50a862 67 0.33 1 18315 18000
85ab1bf8-f762-11e1-a439-00145eb45e9a 2550 0.05 0.9996 2775 3000
282d0ccb-4fa0-40f9-8593-105c77e88417 1565 0.25 0.9981 13875 14000
906e6978-e292-4a8b-9c39-adf6bb0f3323 14843 0.08 0.9979 4440 4000
8ea4250e-0ff0-44f8-812e-bffc3b9ba2a4 2193 0.05 0.9977 2775 3000
cb429b64-d789-47bb-a7d8-379657c5e407 1589 0.04 0.9882 2220 2000
635f7b02-f762-11e1-a439-00145eb45e9a 5346 0.09 0.98 4995 5000
636308c6-f762-11e1-a439-00145eb45e9a 5264 0.09 0.9793 4995 5000
7f513bfc-f762-11e1-a439-00145eb45e9a 617 0.09 0.9747 4995 5000
63656de6-f762-11e1-a439-00145eb45e9a 4935 0.09 0.9743 4995 5000
635e4476-f762-11e1-a439-00145eb45e9a 4703 0.09 0.9693 4995 5000
7be12d06-f762-11e1-a439-00145eb45e9a 31 0.08 0.9688 4440 4000
f97eeccc-8409-4075-ab7c-8980f73c0d0d 944 0.05 0.9555 2775 3000
40d2de00-0c6e-11dd-84d2-b8a03c50a862 61 1 0.9531 55500 56000
6364371e-f762-11e1-a439-00145eb45e9a 3579 0.09 0.9456 4995 5000
488416a3-50f1-43a0-a1ce-daf1cdbf84dd 702 0.08 0.9335 4440 4000
77a24e14-534b-4864-bb95-ae0966ce89ce 252 0.09 0.9197 4995 5000
1cd0cb6d-d8ab-4e4c-9672-70d22fec96b3 919 0.08 0.9172 4440 4000
e0507994-aa35-4ec0-be28-92d4a3c11f75 74 0.04 0.9136 2220 2000
81c024c0-f762-11e1-a439-00145eb45e9a 41 0.5 0.9111 27750 28000
1881d048-04f9-4bc2-b7c8-931d1659a354 5806 0.25 0.8928 13875 14000
ffab0ec5-c5f9-4c26-9b63-28696dc5cac5 359 0.25 0.8886 13875 14000
478e52db-0450-47f0-a763-ad3bcdaba6d9 211 1 0.8866 55500 56000
8958b49a-f762-11e1-a439-00145eb45e9a 207 0.09 0.8809 4995 5000
82d54d68-f762-11e1-a439-00145eb45e9a 3490 0.09 0.8809 4995 5000
7b0153e8-f762-11e1-a439-00145eb45e9a 52 1.41 0.8667 78255 78000
8642bfd0-f762-11e1-a439-00145eb45e9a 1818 0.05 0.8661 2775 3000
863c1269-5f38-4862-a161-ea3a50654d4f 933 0.04 0.8655 2220 2000
89579e3e-f762-11e1-a439-00145eb45e9a 136 0.1 0.8608 5550 6000
82d43810-f762-11e1-a439-00145eb45e9a 3064 0.09 0.8397 4995 5000
0a4f3e3b-1910-445a-afa7-9b93abe69ee1 762 0.25 0.8124 13875 14000
bf0dd71d-364c-4f16-8612-2661f07f40a2 68 1 0.8 55500 56000
dbc709b9-e36e-4dd7-ab5b-c3cb08c2779d 137 0.09 0.7611 4995 5000
88dadc8c-f762-11e1-a439-00145eb45e9a 149 0.1 0.7563 5550 6000
c3b0e0ff-def0-40dd-b72d-cbe5e79c1213 252 0.05 0.7478 2775 3000
2aa53e94-8c53-4afc-b246-d58d5bc6b0fd 31 1 0.7381 55500 56000
a7793424-dc57-4d4d-ad91-9890b76ebc15 62 0.07 0.7294 3885 4000
e3ce628e-9683-4af7-b7a9-47eef785d3bb 6417 0.1 0.724 5550 6000
cd53c947-f81a-485e-92fa-cbc4aa123160 39 0.09 0.7222 4995 5000
84cdd6d0-f762-11e1-a439-00145eb45e9a 84 1 0.6885 55500 56000
82a421d4-f762-11e1-a439-00145eb45e9a 2594 0.1 0.6868 5550 6000
91fa1a0d-a208-40aa-8a6e-f2c0beb9b253 487 0.22 0.6802 12210 12000
85771146-f762-11e1-a439-00145eb45e9a 104 1 0.6753 55500 56000
7d8ed137-1d30-42f1-8b78-12a4957e4690 1199 1 0.6617 55500 56000
83459d16-f762-11e1-a439-00145eb45e9a 44 1 0.6567 55500 56000
867c8490-f762-11e1-a439-00145eb45e9a 151 0.07 0.6371 3885 4000
7b5bc5ba-4972-4876-b35a-3f9aac590fe5 81 0.09 0.6231 4995 5000
61a9ca38-b62f-11e2-afcb-00145eb45e9a 2118 0.1 0.6136 5550 6000
921ae6e9-d890-4826-b6ff-aef43cdcb409 87 0.1 0.6127 5550 6000
6360a554-f762-11e1-a439-00145eb45e9a 453 0.09 0.6064 4995 5000
81c2b870-f762-11e1-a439-00145eb45e9a 46 0.25 0.6053 13875 14000
81c16344-f762-11e1-a439-00145eb45e9a 46 0.25 0.6053 13875 14000
7b8f1304-f762-11e1-a439-00145eb45e9a 800 0.09 0.5848 4995 5000
faf313a1-9ae4-43f4-bfc9-974281feac0e 142 0.04 0.5613 2220 2000
7cc37a0e-c51a-49dd-906e-a91cc5b64f31 56 1 0.5385 55500 56000
083a4342-39f7-4fe0-a03c-d14ab3b9c5e2 182 0.1 0.5 5550 6000
53287502-980d-41a8-b9f0-0f3a5bcb6555 112 0.18 0.4978 9990 10000
6361dbf4-f762-11e1-a439-00145eb45e9a 196 0.09 0.4876 4995 5000
c90a0778-e222-406f-bcd7-ed96a2bc5f4c 716 0.25 0.4745 13875 14000
3e6ba6e2-88f4-48b5-b346-e8aed305f41d 149 0.09 0.4599 4995 5000
3331bcd4-f85e-4252-8e92-3aaa6fdc3eca 63 0.04 0.4315 2220 2000
8795ab31-79ae-4a9c-b285-0b76e5c09b9e 49 0.25 0.4188 13875 14000
86ba1954-f762-11e1-a439-00145eb45e9a 190 0.09 0.4121 4995 5000
857aa892-f762-11e1-a439-00145eb45e9a 2645 0.03 0.3904 1665 2000
a1d63dd1-5456-450c-afc8-25a62ffec50e 702 0.05 0.3744 2775 3000
83492a58-f762-11e1-a439-00145eb45e9a 53 1 0.3732 55500 56000
12ccff8a-cfa8-46f8-a8b0-c6da39cb8910 49 0.5 0.3551 27750 28000
8347f6ba-f762-11e1-a439-00145eb45e9a 36 1 0.3429 55500 56000
84c91672-f762-11e1-a439-00145eb45e9a 452 0.1 0.3199 5550 6000
e2032ac4-b998-47ea-a30a-2608e6a7f7f5 88 0.09 0.3188 4995 5000
0e8b0e10-1680-4d71-ae93-f61bd7933b1d 95 0.16 0.3156 8880 9000
cb456c69-d46f-47a2-a2c7-4da57c79f206 211 0.03 0.3135 1665 2000
891bd4ee-f762-11e1-a439-00145eb45e9a 74 0.25 0.3058 13875 14000
tucotuco commented 4 years ago

I have some concerns about this methodology. Is there no indication of the grid resolution for any of these in dwc:coordinatePrecision? If there is, one could calculate the actual contribution to coordinateUncertaintyInMeters from that source for each grid cell individually (it isn't the same everywhere, as it varies by latitude). I can see using the same value for a whole dataset if the grid is not latitudinally extensive, but then it should be the maximum in the range of the grid for the dataset.

But that is not the only source of uncertainty. It presupposes that the coordinate reference system is known, and that that CRS has no inherent inaccuracies. The former is the real issue. If you have to make the assumption that the CRS is epsg:4326, there is real trouble, as the uncertainty from that assumption varies over the globe from 1554m to 5358m.

The last issue above has other implications in the GBIF pipeline. I believe it is a mistake to interpret as WGS84 any datum that is not identifiable, especially if the record has an uncertainty value, and most especially if the record has an uncertainty value less than 1554m. The reason is that you can also adjust the uncertainty, which is a necessity when making the assertion about the CRS. True that the record will carry a flag about the datum, but the combination of interpreted decimal latitude and longitude with the uninterpreted coordinate uncertainty will be extremely misleading, if not a blatant fabrication.

On Thu, Jul 9, 2020 at 8:36 AM John Waller notifications@github.com wrote:

Here is a table where I have computed half the distance to the nearest neighbor in meters.

  • countNN - number of unique nearest neighbor points with the same distanceNN
  • percentNN - number of unique points that have the same distanceNN
  • distanceNN - distance in decimal degrees to nearest neighbor point.
  • distance_in_meters - converting assuming 0.01 degrees distance to 1.11 km divided by 2
  • distance_in_meters_rounded - distance in meters rounded to nearest 1000 place

distance_in_meters_rounded would probably be a decent value to input as a default value for coordinateuncertaintyinmeters

distance_in_meters = (distanceNN/0.01) (1.111000) datasetkey countNN distanceNN percentNN distance_in_meters distance_in_meters_rounded 8226d882-f762-11e1-a439-00145eb45e9a 167 0.5 1 27750 28000 fe35d6a0-0c6e-11dd-84d2-b8a03c50a862 92 1 1 55500 56000 815fd610-f762-11e1-a439-00145eb45e9a 70 0.5 1 27750 28000 9070a460-0c6e-11dd-84d2-b8a03c50a862 91 1 1 55500 56000 3a8512a0-33cf-11de-afc6-b8a03c50a862 67 0.33 1 18315 18000 85ab1bf8-f762-11e1-a439-00145eb45e9a 2550 0.05 0.9996 2775 3000 282d0ccb-4fa0-40f9-8593-105c77e88417 1565 0.25 0.9981 13875 14000 906e6978-e292-4a8b-9c39-adf6bb0f3323 14843 0.08 0.9979 4440 4000 8ea4250e-0ff0-44f8-812e-bffc3b9ba2a4 2193 0.05 0.9977 2775 3000 cb429b64-d789-47bb-a7d8-379657c5e407 1589 0.04 0.9882 2220 2000 635f7b02-f762-11e1-a439-00145eb45e9a 5346 0.09 0.98 4995 5000 636308c6-f762-11e1-a439-00145eb45e9a 5264 0.09 0.9793 4995 5000 7f513bfc-f762-11e1-a439-00145eb45e9a 617 0.09 0.9747 4995 5000 63656de6-f762-11e1-a439-00145eb45e9a 4935 0.09 0.9743 4995 5000 635e4476-f762-11e1-a439-00145eb45e9a 4703 0.09 0.9693 4995 5000 7be12d06-f762-11e1-a439-00145eb45e9a 31 0.08 0.9688 4440 4000 f97eeccc-8409-4075-ab7c-8980f73c0d0d 944 0.05 0.9555 2775 3000 40d2de00-0c6e-11dd-84d2-b8a03c50a862 61 1 0.9531 55500 56000 6364371e-f762-11e1-a439-00145eb45e9a 3579 0.09 0.9456 4995 5000 488416a3-50f1-43a0-a1ce-daf1cdbf84dd 702 0.08 0.9335 4440 4000 77a24e14-534b-4864-bb95-ae0966ce89ce 252 0.09 0.9197 4995 5000 1cd0cb6d-d8ab-4e4c-9672-70d22fec96b3 919 0.08 0.9172 4440 4000 e0507994-aa35-4ec0-be28-92d4a3c11f75 74 0.04 0.9136 2220 2000 81c024c0-f762-11e1-a439-00145eb45e9a 41 0.5 0.9111 27750 28000 1881d048-04f9-4bc2-b7c8-931d1659a354 5806 0.25 0.8928 13875 14000 ffab0ec5-c5f9-4c26-9b63-28696dc5cac5 359 0.25 0.8886 13875 14000 478e52db-0450-47f0-a763-ad3bcdaba6d9 211 1 0.8866 55500 56000 8958b49a-f762-11e1-a439-00145eb45e9a 207 0.09 0.8809 4995 5000 82d54d68-f762-11e1-a439-00145eb45e9a 3490 0.09 0.8809 4995 5000 7b0153e8-f762-11e1-a439-00145eb45e9a 52 1.41 0.8667 78255 78000 8642bfd0-f762-11e1-a439-00145eb45e9a 1818 0.05 0.8661 2775 3000 863c1269-5f38-4862-a161-ea3a50654d4f 933 0.04 0.8655 2220 2000 89579e3e-f762-11e1-a439-00145eb45e9a 136 0.1 0.8608 5550 6000 82d43810-f762-11e1-a439-00145eb45e9a 3064 0.09 0.8397 4995 5000 0a4f3e3b-1910-445a-afa7-9b93abe69ee1 762 0.25 0.8124 13875 14000 bf0dd71d-364c-4f16-8612-2661f07f40a2 68 1 0.8 55500 56000 dbc709b9-e36e-4dd7-ab5b-c3cb08c2779d 137 0.09 0.7611 4995 5000 88dadc8c-f762-11e1-a439-00145eb45e9a 149 0.1 0.7563 5550 6000 c3b0e0ff-def0-40dd-b72d-cbe5e79c1213 252 0.05 0.7478 2775 3000 2aa53e94-8c53-4afc-b246-d58d5bc6b0fd 31 1 0.7381 55500 56000 a7793424-dc57-4d4d-ad91-9890b76ebc15 62 0.07 0.7294 3885 4000 e3ce628e-9683-4af7-b7a9-47eef785d3bb 6417 0.1 0.724 5550 6000 cd53c947-f81a-485e-92fa-cbc4aa123160 39 0.09 0.7222 4995 5000 84cdd6d0-f762-11e1-a439-00145eb45e9a 84 1 0.6885 55500 56000 82a421d4-f762-11e1-a439-00145eb45e9a 2594 0.1 0.6868 5550 6000 91fa1a0d-a208-40aa-8a6e-f2c0beb9b253 487 0.22 0.6802 12210 12000 85771146-f762-11e1-a439-00145eb45e9a 104 1 0.6753 55500 56000 7d8ed137-1d30-42f1-8b78-12a4957e4690 1199 1 0.6617 55500 56000 83459d16-f762-11e1-a439-00145eb45e9a 44 1 0.6567 55500 56000 867c8490-f762-11e1-a439-00145eb45e9a 151 0.07 0.6371 3885 4000 7b5bc5ba-4972-4876-b35a-3f9aac590fe5 81 0.09 0.6231 4995 5000 61a9ca38-b62f-11e2-afcb-00145eb45e9a 2118 0.1 0.6136 5550 6000 921ae6e9-d890-4826-b6ff-aef43cdcb409 87 0.1 0.6127 5550 6000 6360a554-f762-11e1-a439-00145eb45e9a 453 0.09 0.6064 4995 5000 81c2b870-f762-11e1-a439-00145eb45e9a 46 0.25 0.6053 13875 14000 81c16344-f762-11e1-a439-00145eb45e9a 46 0.25 0.6053 13875 14000 7b8f1304-f762-11e1-a439-00145eb45e9a 800 0.09 0.5848 4995 5000 faf313a1-9ae4-43f4-bfc9-974281feac0e 142 0.04 0.5613 2220 2000 7cc37a0e-c51a-49dd-906e-a91cc5b64f31 56 1 0.5385 55500 56000 083a4342-39f7-4fe0-a03c-d14ab3b9c5e2 182 0.1 0.5 5550 6000 53287502-980d-41a8-b9f0-0f3a5bcb6555 112 0.18 0.4978 9990 10000 6361dbf4-f762-11e1-a439-00145eb45e9a 196 0.09 0.4876 4995 5000 c90a0778-e222-406f-bcd7-ed96a2bc5f4c 716 0.25 0.4745 13875 14000 3e6ba6e2-88f4-48b5-b346-e8aed305f41d 149 0.09 0.4599 4995 5000 3331bcd4-f85e-4252-8e92-3aaa6fdc3eca 63 0.04 0.4315 2220 2000 8795ab31-79ae-4a9c-b285-0b76e5c09b9e 49 0.25 0.4188 13875 14000 86ba1954-f762-11e1-a439-00145eb45e9a 190 0.09 0.4121 4995 5000 857aa892-f762-11e1-a439-00145eb45e9a 2645 0.03 0.3904 1665 2000 a1d63dd1-5456-450c-afc8-25a62ffec50e 702 0.05 0.3744 2775 3000 83492a58-f762-11e1-a439-00145eb45e9a 53 1 0.3732 55500 56000 12ccff8a-cfa8-46f8-a8b0-c6da39cb8910 49 0.5 0.3551 27750 28000 8347f6ba-f762-11e1-a439-00145eb45e9a 36 1 0.3429 55500 56000 84c91672-f762-11e1-a439-00145eb45e9a 452 0.1 0.3199 5550 6000 e2032ac4-b998-47ea-a30a-2608e6a7f7f5 88 0.09 0.3188 4995 5000 0e8b0e10-1680-4d71-ae93-f61bd7933b1d 95 0.16 0.3156 8880 9000 cb456c69-d46f-47a2-a2c7-4da57c79f206 211 0.03 0.3135 1665 2000 891bd4ee-f762-11e1-a439-00145eb45e9a 74 0.25 0.3058 13875 14000

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gbif/portal-feedback/issues/2872#issuecomment-656076108, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72ZFWEHEG2AF6PQ2MUDR2WTVVANCNFSM4OVJFXYQ .

ArthurChapman commented 4 years ago

I agree largely with @tucotuco but, without looking in detail at all these datasets. Not sure if what I have written below adds much to the argument, but is a quick analysis of the type of datasets included.

It would appear, from a quick look, that there are 4 main types of datasets here.

  1. Many of the datasets cover small areas within a narrow latitudinal range (i.e. most of those with distanceNN=0.09 or less). These would appear to me mostly based on metric grids of often 100m, 1 km, 5km or 10km square grids, etc. As noted - they nearly always cover a very narrow longitudinal range.

  2. There are a number (where distanceNN=0.1) that from a quick look - cover a very broad latitudinal range (often covering the full latitudinal range (-90 to +90)

  3. There are a number (where distanceNN=0.25) are generally Southern African based on a quarter degree grid square (a system in common use in subSaharan Africa). Many do cover a broud longitudinal range.

  4. The fourth group are Data sets where distanceNN=1 - i.e. One-degree grids. These often - but not always - cover broad longitudinal ranges. They include many marine datasets, and some cover the full latitudinal range (-90 to +90).

tucotuco commented 4 years ago

Interesting. The QDS grids offer yet another challenge - the location of the coordinate with respect to the center of the grid cell, and thus the relationship between the coordinates and the uncertainty,

On Sun, Jul 12, 2020 at 9:08 PM Arthur Chapman notifications@github.com wrote:

I agree largely with @tucotuco https://github.com/tucotuco but, without looking in detail at all these datasets. Not sure if what I have written below adds much to the argument, but is a quick analysis of the type of datasets included.

It would appear, from a quick look, that there are 4 main types of datasets here.

1.

Many of the datasets cover small areas within a narrow latitudinal range (i.e. most of those with distanceNN=0.09 or less). These would appear to me mostly based on metric grids of often 100m, 1 km, 5km or 10km square grids, etc. As noted - they nearly always cover a very narrow longitudinal range. 2.

There are a number (where distanceNN=0.1) that from a quick look - cover a very broad latitudinal range (often covering the full latitudinal range (-90 to +90) 3.

There are a number (where distanceNN=0.25) are generally Southern African based on a quarter degree grid square (a system in common use in subSaharan Africa). Many do cover a broud longitudinal range. 4.

The fourth group are Data sets where distanceNN=1 - i.e. One-degree grids. These often - but not always - cover broad longitudinal ranges. They include many marine datasets, and some cover the full latitudinal range (-90 to +90).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gbif/portal-feedback/issues/2872#issuecomment-657297043, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ726JU2MMSEZTCHKL4PLR3JGBJANCNFSM4OVJFXYQ .

ArthurChapman commented 4 years ago

You've brought up a good point @tucotuco

How are the grids referenced - the centre, bottom left of top left (or some other). Determining the location of the coordinates are dependant on the method.

MattBlissett commented 4 years ago

In reading the draft of Georeferencing Best Practises, I realized we should increase coordinateUncertaintyInMeters at least when we've assumed the datum to be WGS84.

It's useful to identify these gridded datasets, and to flag them, but to avoid making inaccurate data appear more accurate than it really is, we might need to keep this as a machine tag or similar, and add more complicated processing during interpretation -- e.g. taking account of latitude, adding issues.

jhnwllr commented 4 years ago

Thanks for the detailed replies @tucotuco @ArthurChapman

With regards to dwc:coordinatePrecision, 6 gridded datasets (according to my nearest neighbor method) do fill in this value.

That leaves 71 gridded datasets with NULL for coordinateUncertaintyInMeters, coordinatePrecision, footPrintWKT.

57 of these gridded datasets also have NULL for v_geodeticdatum.

As far as I understand, issues of setting a default value for coordinateUncertaintyInMeters:

  1. decimal degrees distance under-estimates uncertainty (worse with broad latitudinal range)
  2. we don't know the geodeticdatum for most of these datasets, so it is not possible to convert to meters
  3. details about the grid structure might make knowing where center of "uncertainty circle" hard to know

With this information I agree with @MattBlissett that some type of issue flagging has to be done instead of a default value.

The 57 gridded datasets that do not fill in coordinateUncertaintyInMeters, coordinatePrecision, footPrintWKT, or v_geodeticdatum

datasetlink unique_meters unique_wkt unique_cp unique_datum
https://www.gbif.org/dataset/fe35d6a0-0c6e-11dd-84d2-b8a03c50a862 0 0 0 0
https://www.gbif.org/dataset/9070a460-0c6e-11dd-84d2-b8a03c50a862 0 0 0 0
https://www.gbif.org/dataset/3a8512a0-33cf-11de-afc6-b8a03c50a862 0 0 0 0
https://www.gbif.org/dataset/85ab1bf8-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/282d0ccb-4fa0-40f9-8593-105c77e88417 0 0 0 0
https://www.gbif.org/dataset/906e6978-e292-4a8b-9c39-adf6bb0f3323 0 0 0 0
https://www.gbif.org/dataset/cb429b64-d789-47bb-a7d8-379657c5e407 0 0 0 0
https://www.gbif.org/dataset/635f7b02-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/636308c6-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/7f513bfc-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/63656de6-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/635e4476-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/f97eeccc-8409-4075-ab7c-8980f73c0d0d 0 0 0 0
https://www.gbif.org/dataset/40d2de00-0c6e-11dd-84d2-b8a03c50a862 0 0 0 0
https://www.gbif.org/dataset/6364371e-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/488416a3-50f1-43a0-a1ce-daf1cdbf84dd 0 0 0 0
https://www.gbif.org/dataset/77a24e14-534b-4864-bb95-ae0966ce89ce 0 0 0 0
https://www.gbif.org/dataset/1cd0cb6d-d8ab-4e4c-9672-70d22fec96b3 0 0 0 0
https://www.gbif.org/dataset/1881d048-04f9-4bc2-b7c8-931d1659a354 0 0 0 0
https://www.gbif.org/dataset/ffab0ec5-c5f9-4c26-9b63-28696dc5cac5 0 0 0 0
https://www.gbif.org/dataset/478e52db-0450-47f0-a763-ad3bcdaba6d9 0 0 0 0
https://www.gbif.org/dataset/82d54d68-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/8958b49a-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/7b0153e8-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/89579e3e-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/82d43810-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/0a4f3e3b-1910-445a-afa7-9b93abe69ee1 0 0 0 0
https://www.gbif.org/dataset/bf0dd71d-364c-4f16-8612-2661f07f40a2 0 0 0 0
https://www.gbif.org/dataset/dbc709b9-e36e-4dd7-ab5b-c3cb08c2779d 0 0 0 0
https://www.gbif.org/dataset/88dadc8c-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/a7793424-dc57-4d4d-ad91-9890b76ebc15 0 0 0 0
https://www.gbif.org/dataset/84cdd6d0-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/82a421d4-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/85771146-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/7d8ed137-1d30-42f1-8b78-12a4957e4690 0 0 0 0
https://www.gbif.org/dataset/867c8490-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/7b5bc5ba-4972-4876-b35a-3f9aac590fe5 0 0 0 0
https://www.gbif.org/dataset/921ae6e9-d890-4826-b6ff-aef43cdcb409 0 0 0 0
https://www.gbif.org/dataset/6360a554-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/7b8f1304-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/7cc37a0e-c51a-49dd-906e-a91cc5b64f31 0 0 0 0
https://www.gbif.org/dataset/53287502-980d-41a8-b9f0-0f3a5bcb6555 0 0 0 0
https://www.gbif.org/dataset/6361dbf4-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/c90a0778-e222-406f-bcd7-ed96a2bc5f4c 0 0 0 0
https://www.gbif.org/dataset/3e6ba6e2-88f4-48b5-b346-e8aed305f41d 0 0 0 0
https://www.gbif.org/dataset/8795ab31-79ae-4a9c-b285-0b76e5c09b9e 0 0 0 0
https://www.gbif.org/dataset/857aa892-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/12ccff8a-cfa8-46f8-a8b0-c6da39cb8910 0 0 0 0
https://www.gbif.org/dataset/84c91672-f762-11e1-a439-00145eb45e9a 0 0 0 0
https://www.gbif.org/dataset/e2032ac4-b998-47ea-a30a-2608e6a7f7f5 0 0 0 0
https://www.gbif.org/dataset/891bd4ee-f762-11e1-a439-00145eb45e9a 0 0 0 0
tucotuco commented 4 years ago

That plan makes me much more comfortable.

On Tue, Jul 14, 2020 at 4:31 AM John Waller notifications@github.com wrote:

Thanks for the detailed replies @tucotuco https://github.com/tucotuco @ArthurChapman https://github.com/ArthurChapman

With regards to dwc:coordinatePrecision, 6 gridded datasets (according to my nearest neighbor method) do fill in this value.

That leaves 71 gridded datasets with NULL for coordinateUncertaintyInMeters, coordinatePrecision, footPrintWKT.

57 of these gridded datasets also have NULL for v_geodeticdatum.

As far as I understand, issues of setting a default value for coordinateUncertaintyInMeters:

  1. decimal degrees distance under-estimates uncertainty (worse with broad latitudinal range)
  2. we don't know the geodeticdatum for most of these datasets, so it is not possible to convert to meters
  3. details about the grid structure might make knowing where center of "uncertainty circle" hard to know

With this information I agree with @MattBlissett https://github.com/MattBlissett that some type of issue flagging has to be done instead of a default value.

The 57 gridded datasets that do not fill in coordinateUncertaintyInMeters, coordinatePrecision, footPrintWKT, or v_geodeticdatum datasetlink unique_meters unique_wkt unique_cp unique_datum https://www.gbif.org/dataset/fe35d6a0-0c6e-11dd-84d2-b8a03c50a862 0 0 0 0 https://www.gbif.org/dataset/9070a460-0c6e-11dd-84d2-b8a03c50a862 0 0 0 0 https://www.gbif.org/dataset/3a8512a0-33cf-11de-afc6-b8a03c50a862 0 0 0 0 https://www.gbif.org/dataset/85ab1bf8-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/282d0ccb-4fa0-40f9-8593-105c77e88417 0 0 0 0 https://www.gbif.org/dataset/906e6978-e292-4a8b-9c39-adf6bb0f3323 0 0 0 0 https://www.gbif.org/dataset/cb429b64-d789-47bb-a7d8-379657c5e407 0 0 0 0 https://www.gbif.org/dataset/635f7b02-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/636308c6-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/7f513bfc-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/63656de6-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/635e4476-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/f97eeccc-8409-4075-ab7c-8980f73c0d0d 0 0 0 0 https://www.gbif.org/dataset/40d2de00-0c6e-11dd-84d2-b8a03c50a862 0 0 0 0 https://www.gbif.org/dataset/6364371e-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/488416a3-50f1-43a0-a1ce-daf1cdbf84dd 0 0 0 0 https://www.gbif.org/dataset/77a24e14-534b-4864-bb95-ae0966ce89ce 0 0 0 0 https://www.gbif.org/dataset/1cd0cb6d-d8ab-4e4c-9672-70d22fec96b3 0 0 0 0 https://www.gbif.org/dataset/1881d048-04f9-4bc2-b7c8-931d1659a354 0 0 0 0 https://www.gbif.org/dataset/ffab0ec5-c5f9-4c26-9b63-28696dc5cac5 0 0 0 0 https://www.gbif.org/dataset/478e52db-0450-47f0-a763-ad3bcdaba6d9 0 0 0 0 https://www.gbif.org/dataset/82d54d68-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/8958b49a-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/7b0153e8-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/89579e3e-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/82d43810-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/0a4f3e3b-1910-445a-afa7-9b93abe69ee1 0 0 0 0 https://www.gbif.org/dataset/bf0dd71d-364c-4f16-8612-2661f07f40a2 0 0 0 0 https://www.gbif.org/dataset/dbc709b9-e36e-4dd7-ab5b-c3cb08c2779d 0 0 0 0 https://www.gbif.org/dataset/88dadc8c-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/a7793424-dc57-4d4d-ad91-9890b76ebc15 0 0 0 0 https://www.gbif.org/dataset/84cdd6d0-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/82a421d4-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/85771146-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/7d8ed137-1d30-42f1-8b78-12a4957e4690 0 0 0 0 https://www.gbif.org/dataset/867c8490-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/7b5bc5ba-4972-4876-b35a-3f9aac590fe5 0 0 0 0 https://www.gbif.org/dataset/921ae6e9-d890-4826-b6ff-aef43cdcb409 0 0 0 0 https://www.gbif.org/dataset/6360a554-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/7b8f1304-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/7cc37a0e-c51a-49dd-906e-a91cc5b64f31 0 0 0 0 https://www.gbif.org/dataset/53287502-980d-41a8-b9f0-0f3a5bcb6555 0 0 0 0 https://www.gbif.org/dataset/6361dbf4-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/c90a0778-e222-406f-bcd7-ed96a2bc5f4c 0 0 0 0 https://www.gbif.org/dataset/3e6ba6e2-88f4-48b5-b346-e8aed305f41d 0 0 0 0 https://www.gbif.org/dataset/8795ab31-79ae-4a9c-b285-0b76e5c09b9e 0 0 0 0 https://www.gbif.org/dataset/857aa892-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/12ccff8a-cfa8-46f8-a8b0-c6da39cb8910 0 0 0 0 https://www.gbif.org/dataset/84c91672-f762-11e1-a439-00145eb45e9a 0 0 0 0 https://www.gbif.org/dataset/e2032ac4-b998-47ea-a30a-2608e6a7f7f5 0 0 0 0 https://www.gbif.org/dataset/891bd4ee-f762-11e1-a439-00145eb45e9a 0 0 0 0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gbif/portal-feedback/issues/2872#issuecomment-658020408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72YLN5DXUX6VMSHEZ3LR3QCW3ANCNFSM4OVJFXYQ .

ArthurChapman commented 4 years ago

As there such a small number of datsets - can they be contacted and asked to supply more metdata?

jhnwllr commented 4 years ago

@ArthurChapman yes we plan to do this.