SpeciesFileGroup / INHS-Insect-Collection-Data-Curation

An accesible issue tracker for reporting issues or requests with respect to INHS data quality.
2 stars 0 forks source link

invalid or overaccurate coordinateUncertaintyInMeters (via Mesibov) #74

Open mjy opened 4 months ago

mjy commented 4 months ago
  • invalid or overaccurate coordinateUncertaintyInMeters: data issue
tmcelrath commented 4 months ago

Not sure exactly what is meant here. Are we talking extra sig. figs?

mjy commented 4 months ago

@Mesibov can you clarify? likely sig-figs is his issue

select distinct "coordinateUncertaintyInMeters" from dwc_occurrences where project_id = 1;

Hmm- why is 74m there, one of these things is not like the others.

 748.8405857730946
 74m
 75.4281763295557
 750.0
 7500.0
 751.6423404966713
 754.5399279502357
 757.9651323351706
 76.5570962975804
 764.5902389129819
 7686.0
 775.3439805731413
 776.41865961447
 776.520386311501
 7796.09457165164
 7800.0
 7817.21069747346
 782.0
 784.089635815655
 7882.24344094419
 79.7971421413789
 790.2629968469731
 800.0
 8000.0
 8003.34773848392
 805.7295944391607
 8058.92561065185
 809.596219116548
 818.8545608754471
 820.3306502100987
 8227.6571101691
 823.546822217106
 828.724703149433
 831.692683548881
 8316.82043332159
 836.481716290153
 837.132748406671
 8388.43747170582
 846.539514701933
 8500.0
 857.1139252044209
 863.2552285086282
 8684.861133282946
 871.2369310259176
 8945.321017818254
 90.0
 900.0
 9000.0
 918.537006879216
 929.516467970883
 935.6765476688439
 9436.505785229396
 96.93616071712428
 9649.9406095305
 9684.25602192417
 969.0
 973.779434524857
 976.713724947332
 985.3808236227832
 99.7668002689468
 997.376966818365
 9972.589691929623
 999.0
 999.283690376198
 9999.0
tmcelrath commented 4 months ago

This is actually a TW issue. Uncertainty creator adds crazy amounts of sig figs.

mjy commented 4 months ago

There is an emerging line of thought that doesn't worry about this sig-fig, as they are representations of the conversion of one-unit to another via the limitations of the calculation. For example these were feet converted to meters values likely. In other words, if we want to back-cacluate the value of an individual record in feet then we want to use as much accuracy as possible. Now, when doing calculations across records we'd have to make some sig-fig descisions, however clearly with these collecitons data nobody is recording at this level, so this is a very diffferent issue than in chemistry and physics. Furthermore, when aggregating across collections we're going to have even less certainty, so anyone actually looking at their data is going to conservatively round far beyond these exact values.

This is all to say, we might not change this. :)

tmcelrath commented 4 months ago

https://github.com/SpeciesFileGroup/taxonworks/issues/3946 - Can move discussion there.

mjy commented 4 months ago

3946 is the same theme, but technically different.

Mesibov commented 4 months ago

Good morning, Matt and Tom. In cUIM, invalid entries are: 100000m 10000m 10000M 1000m 100m 10m 1km 35000m 43M 5000m ±50m 50m 6000m 74m -74.98584

The sigfig issue is important because in cUIM values like "748.8405857730946" the numbers past the decimal point are pure data noise. cUIM is an estimate based on software or human judgment and neither the software nor the human estimator provides an error (plus or minus N). Rounding to whole meters or even higher (as in "35000") is justified.

I've seen cUIM used to build uncertainty circles in GIS and there is no significant difference so far as this building is concerned between "748.8405857730946" and "749". Back-calculation to feet would be an odd thing to do.

Note also that the method of estimating cUIM has not been documented for each record. I'm not saying it should be, just pointing out that in the soup of cUIMs estimated by various means and by various people and programs, the practical basis for comparison is whole meters, and that's what Darwin Core expects: "The horizontal distance (in meters) from the given dwc:decimalLatitude and dwc:decimalLongitude describing the smallest circle containing the whole of the dcterms:Location." (DwC)