DistanceDevelopment / readdst

Convert Distance for Windows projects into R code/data
GNU General Public License v3.0
1 stars 2 forks source link

Deal with unit conversion #16

Closed dill closed 8 years ago

dill commented 8 years ago

Quantities stored in Distance can be in different (non-SI) units. For example for one project:

> db$DataFields
       LayerName     FieldName     TableName FieldType       Units OrdinalPosition ColumnWidth Tag Formula
1  Line transect            ID Line transect        10                           1           0  NA      NA
2  Line transect         Label Line transect        12                           3           0  NA      NA
3  Line transect   Line length Line transect         2        Mile               4           0  NA      NA
4  Line transect      ParentID Line transect        11                           2           0  NA      NA
5    Observation          Hour   Observation         2                           5           0  NA      NA
6    Observation            ID   Observation        10                           1           0  NA      NA
7    Observation         MSTDO   Observation         2                           4           0  NA      NA
8    Observation      ParentID   Observation        11                           2           0  NA      NA
9    Observation Perp distance   Observation         2   Kilometer               3           0  NA      NA
10        Region          Area        Region         2 Square mile               4           0  NA      NA
11        Region            ID        Region        10                           1           0  NA      NA
12        Region         Label        Region        12                           3           0  NA      NA
13        Region      ParentID        Region        11                           2           0  NA      NA
14    Study area            ID    Study area        10                           1           0  NA      NA
15    Study area         Label    Study area        12                           2           0  NA      NA

Indicating lines were measured in miles, perpendicular distances in kilometres and the region in square miles (:fire::fire::computer::fire::fire:).

readdst should deal with this and be able to calculate abundances and densities appropriately.

The ProjectSettingsNumber table in the DistIni.mdb file has conversions. For example for linear units:

> subset(db$ProjectSettingsNumber, Section=="LinUnit")
    Section                      Key      Setting
372 LinUnit               Centimeter    0.0100000
373 LinUnit               Centimetre    0.0100000
374 LinUnit         Chain (Benoit A)   20.1167824
375 LinUnit           Chain (Benoit)   20.1167825
376 LinUnit           Chain (Clarke)   20.1166195
377 LinUnit            Chain (Sears)   20.1167651
378 LinUnit        Chain (US Survey)   20.1168402
379 LinUnit                   Fathom    1.8288000
380 LinUnit                     Foot    0.3048000
381 LinUnit              Foot (1865)    0.3048008
382 LinUnit          Foot (Benoit A)    0.3047997
383 LinUnit          Foot (Benoit B)    0.3047997
384 LinUnit            Foot (Clarke)    0.3047973
385 LinUnit        Foot (Gold Coast)    0.3047997
386 LinUnit       Foot (Indian 1937)    0.3047984
387 LinUnit       Foot (Indian 1962)    0.3047996
388 LinUnit       Foot (Indian 1975)    0.3047995
389 LinUnit            Foot (Indian)    0.3047995
390 LinUnit Foot (Modified American)    0.3048123
391 LinUnit             Foot (Sears)    0.3047995
392 LinUnit         Foot (US Survey)    0.3048006
393 LinUnit                  Furlong  201.1680000
394 LinUnit                     Inch    0.0254000
395 LinUnit                Kilometer 1000.0000000
396 LinUnit                Kilometre 1000.0000000
397 LinUnit                     Link    0.2011662
398 LinUnit          Link (Benoit A)    0.2011678
399 LinUnit            Link (Benoit)    0.2011678
400 LinUnit             Link (Sears)    0.2011677
401 LinUnit         Link (US Survey)    0.2011684
402 LinUnit                    Meter    1.0000000
403 LinUnit                    Metre    1.0000000
404 LinUnit           Metre (German)    1.0000014
405 LinUnit                     Mile 1609.3440000
406 LinUnit         Mile (US Survey) 1609.3472187
407 LinUnit               Millimeter    0.0010000
408 LinUnit               Millimetre    0.0010000
409 LinUnit            Nautical Mile 1852.0000000
410 LinUnit                      Rod    5.0292000
411 LinUnit                     Yard    0.9144000
412 LinUnit          Yard (Benoit A)    0.9143992
413 LinUnit          Yard (Benoit B)    0.9143992
414 LinUnit            Yard (Clarke)    0.9143918
415 LinUnit       Yard (Indian 1937)    0.9143952
416 LinUnit       Yard (Indian 1962)    0.9143988
417 LinUnit       Yard (Indian 1975)    0.9143985
418 LinUnit            Yard (Indian)    0.9143986
419 LinUnit             Yard (Sears)    0.9143984

and the following from the developer manual seems useful:

screen shot 2015-12-03 at 23 55 37

So during the convert_project stage, the units should be switched to SI.

erex commented 8 years ago

I'll take the blame/flame for the first item in this issue; simulated dataset about which I cared very little for unit continuity.

the Montrave line transect data (Saturday adventure) was collected by Prof Buckland and follows a much more sane units convention: study area size in ha, transect lengths in km and perpendicular distances in m. There is the wee curve ball of two transits of the transects (so there's a multiplier).

dill commented 8 years ago

858cbac includes support for unit conversion and repeat visits (though see also #18).

These results are now not perfect, but significantly closer than they were. I think there may be issues getting uncertainty estimates to agree as I think the methods are different, but this may also be caused by the degrees of freedom (I think this will currently be incorrect when there are repeats in the data).

For example for the whale simulations:

cc <- convert_project("inst/CovarWhaleSim-solutions/CovarWhaleSim-solutions")
test_stats(cc[[2]])
         Statistic Distance_value  mrds_value    Rel_diff Pass
1                n     60.0000000  60.0000000 0.000000000    ✓
2       parameters      1.0000000   1.0000000 0.000000000    ✓
3              AIC    123.2824020 123.2821247 0.000000000    ✓
4          Chi^2 p      0.7460822   0.8151965 0.092636280
5              P_a      0.4956547   0.4956543 0.000000000    ✓
6          CV(P_a)      0.0938000   0.0937935 0.000000000    ✓
7   log-likelihood    -60.6412010 -60.6410623 0.000000000    ✓
8            K-S p      0.7095534   0.7095513 0.000000000    ✓
9           C-vM p      0.8000000   0.7631826 0.046021780
10         density      0.0346334   0.0346334 0.000000000    ✓
11     CV(density)      0.1400000   0.1399674 0.000232937
12     density lcl      0.0259316   0.0259316 0.000000000    ✓
13     density ucl      0.0462554   0.0462554 0.000000000    ✓
14      density df     21.3759995  21.3758021 0.000000000    ✓
15     individuals    346.0000000 346.3343596 0.000000000    ✓
16 CV(individuals)      0.1400000   0.1399674 0.000232937
17 individuals lcl    259.0000000 259.3158988 0.000000000    ✓
18 individuals ucl    463.0000000 462.5535465 0.000000000    ✓
19  individuals df     21.3759995  21.3758021 0.000000000    ✓

(though there are still issues when covariates are included).

This is by no means done, but estimates are much much closer now.

erex commented 8 years ago

Looking good. The CvM P-value will always disagree because DisWin simply does a table lookup so P-values are only recorded to the nearest 0.1. I can't sort out how tolerance is calculated

 tolerance = as.numeric(x[5]))

but I think it could be relaxes to the point where a P-value of 0.13996 is considered equivalent to 0.14000

dill commented 8 years ago

Tolerances are stored in stats_table, which I have changed to be 1e-1 and added an "Additional notes" section to the documentation for this function to hold such facts (see b4e8dba).

Do you know of any more such "facts" that might be useful?