bioXgeo / geodiv

Other
11 stars 2 forks source link

`sfd()` irregularly returns NA or NaN with the same `RasterLayer` #11

Open edwardlavender opened 1 year ago

edwardlavender commented 1 year ago

This is an interesting package! I was surprised to get different results from sfd() for the same RasterLayer -- sometimes I get NA, sometimes NaN and other times I get numerical estimates that are similar but not necessarily identical. Is this behaviour expected? (E.g., Is there some stochasticity in the algorithm used?) This happens even if I set the seed.

I am not sure why this is happening, so it is hard to generate an example using synthetic data, but I've attached an example dataset to this issue (tmp.txt). The example dataset is a longitude/latitude RasterLayer - I am not sure if that could be causing the issue or if your routine handles this?

This code illustrates the issue with the example dataset:

# set seed
set.seed(1)
# read file as RasterLayer
r <- dget(file = "tmp.txt")
# Run `sfd` a bunch of times and look at the outputs 
replicate(100, geodiv::sfd(r, silent = TRUE))

These are the outputs I get, which you can see vary:

[1] NaN 2.000341 2.000341 2.000345 2.000341 2.000341 NA NaN NaN NA NaN NaN NaN 2.000341 NaN NaN NaN NaN 2.000341 2.000341 [21] NaN NaN 2.000341 2.000341 2.000341 2.000341 NA NA NA 2.000341 2.000341 2.000341 NA NA NaN 2.000341 2.000341 NaN NaN NaN [41] NaN NaN NaN NaN NaN NaN NaN NaN NaN 2.000341 2.000341 NaN NaN NaN NaN NaN NaN 2.000341 2.000341 2.000341 [61] 2.000341 NA NA NA 2.000341 NaN NaN NaN 2.000341 NaN NaN NaN 2.025805 2.026341 NaN NaN NaN 2.000341 2.000341 NaN [81] NaN NaN 2.000341 NaN NaN NaN 2.000341 2.000341 NaN NaN 2.000341 2.000341 2.000341 NA NA NA 2.000341 NaN NaN 2.000341

AnnieCooper commented 1 year ago

Thanks for bringing this to my attention! I think that I solved the problem by correcting the raster to matrix conversion step. The output of your example is now consistently 2.000341. I committed the updated code to the "dev" branch and am working on updating the CRAN version of the package. That update should be posted within 1-2 weeks. Please let me know if the correction didn't fix the issue for you, or if you find any other bugs.

edwardlavender commented 1 year ago

Thank you for your prompt response! I re-installed the package from the development branch and re-tested the example as shown below. It seems that I am sometimes getting fewer NAs/NaNs (the number differs every time I run the code), but they are still present and the numerical estimates differ (slightly) too.

# Install package
renv::install("bioXgeo/geodiv@dev")
rstudioapi::restartSession()
packageVersion("geodiv")
# [1] ‘1.1.0’

# Run `sfd` a bunch of times and look at the outputs 
set.seed(1)
r <- dget(file = "tmp.txt")
fd <- replicate(1e3, geodiv::sfd(r, silent = TRUE))
unique(fd)
#  NaN 2.000341 2.000344 2.000349 2.000348 2.000343
table(is.na(fd))
# FALSE  TRUE 
# 945    55 

This is the output of sessionInfo():

R version 4.2.3 (2023-03-15) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.3.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] Rcpp_1.0.10 rstudioapi_0.14 raster_3.6-20 magrittr_2.0.3 units_0.8-2 tidyselect_1.2.0 lattice_0.21-8 R6_2.5.1 rlang_1.1.1
[10] fansi_1.0.4 dplyr_1.1.2 tools_4.2.3 grid_4.2.3 parallel_4.2.3 snow_0.4-4 KernSmooth_2.23-21 utf8_1.2.3 DBI_1.1.3
[19] terra_1.7-29 cli_3.6.1 e1071_1.7-13 class_7.3-22 tibble_3.2.1 lifecycle_1.0.3 sf_1.0-12 codetools_0.2-19 vctrs_0.6.2
[28] glue_1.6.2 sp_1.6-0 proxy_0.4-27 pracma_2.4.2 compiler_4.2.3 pillar_1.9.0 generics_0.1.3 spatial_7.3-16 classInt_0.4-9
[37] renv_0.17.3 geodiv_1.1.0 zoo_1.8-12 pkgconfig_2.0.3

AnnieCooper commented 1 year ago

I pushed another update to the sfd function. Can you try reload the dev branch and see if it fixes the issue for you?

edwardlavender commented 1 year ago

Thanks for your response. Following the same procedure as described above, I still generate NaNs for the example dataset on some runs & there are small differences between values.

AnnieCooper commented 1 year ago

Ok, thank you for checking. I think that the issue must be in the versions of packages used. I'll work on replicating the error and aim to get it fixed in the next 2-3 weeks.