fitzLab-AL / gdm

R package for Generalized Dissimilarity Modeling
GNU General Public License v3.0
33 stars 12 forks source link

Using terra() data objects with gdm ver v1.6.0 [e.g., for visualizing Multi-dimensional Biological patterns] #36

Closed Basquill closed 1 month ago

Basquill commented 7 months ago

Hello @rvalavi and @fitzLab-AL,

I'm testing gdm ver 1.6.0 using terra data objects. To lower computational time for this test, I limited my model to 3 predictors (instead of the 17 I'm normally running) with geo=TRUE. Tests were run on a desktop PC.

The code in Section 2.1 from Appendix S1 of the user guide (Mokany et al 2022) ran okay with a couple minor code amendments. Next, I want to create maps so I can visualize compositional variation across geographic space. The code in section 2.2 of Appendix S1 worked fine until I got to "scaling the PCA rasters to make full use of the colour spectrum"

I get an error when running any one of the following code chunks. Note, I added a space after the "at" symbol in these code chunks because a github user has that as a handle i.e., @ data (no space)

pcaRast[[1]] <- (pcaRast[[1]]-pcaRast[[1]]@ data@min) / (pcaRast[[1]]@ data@max-pcaRast[[1]]@ data@min)255 pcaRast[[2]] <- (pcaRast[[2]]-pcaRast[[2]]@ data@min) / (pcaRast[[2]]@ data@max-pcaRast[[2]]@ data@min)255 pcaRast[[3]] <- (pcaRast[[3]]-pcaRast[[3]]@ data@min) / (pcaRast[[3]]@ data@max-pcaRast[[3]]@ data@min)*255

-- Error: no slot of name "data" for this object of class "SpatRaster"

I'm assuming this is because SpatRaster objects were not part of the original pipeline. I tried changing the references (e.g., [[1]]) in these chunks and it didn't work. Note, the name of each raster in pcaRast are lyr1, lyr2, and lyr3. Any suggestions?

Overall, the addition of terra() functionality to this version of gdm makes things much faster, which is great to see.

Thanks.

rvalavi commented 7 months ago

Hi Sean @Basquill

Thank you for your inquiry. You are right! The material by Mokany et al. (2022) was based on the previous version. However, all the gdm related code should remain the same. The issue here is that you attempted to perform a raster package operation on a terra object, as you suspected. You can refer to the help function of the gdm.transform function to learn how to perform this using SpatRaster objects, or access the online material here.

https://github.com/fitzLab-AL/gdm?tab=readme-ov-file#visualizing-multi-dimensional-biological-patterns

Basquill commented 7 months ago

Hi Roozbeh @rvalavi,

Thanks for your prompt reply. That's excellent -- I didn't realize the help functions had been updated. And I assumed the online package info mirrored material in Mokany et al (2022). I can't get to this until Monday. Will try then. Thanks again!

Basquill commented 7 months ago

Hi Roozbeh @rvalavi,

I ran this twice on a desktop. My study area is 55 000 sq kms and my predictors are very fine (10 x 10 m). The model explains 31% of deviance.

For the first trial, I randomly selected 3 predictors to test the code. The second trial had the full suite of 16 predictors. In both cases, geo=TRUE.

A couple issues arose:

1) I noticed that the rescaled rasters for my x and y coordinates (UTM Eastings and Northings) don't look right.

2) When I ran the second trial (16 predictors plus Easting and Northing), I only get 14 (instead of 18) layers in the SpatRaster produced through transRasts <- gdm.transform(model=gdmRastMod, data=envRast). There was a warning about saving the transRasts data object, but no errors.

Warning message:
[writeStart] Estimated disk space needed without compression: 153GB. Available: 119 GB.

I've attached exports showing the rescaled rasters (incl. the seemingly erroneous plots for easting and northing).

Any thoughts? thanks


3 predictors plus geography

Rescaled rasters_25march2024


16 predictors plus geography

transformed rasters_28march2024

rvalavi commented 7 months ago

Hi @Basquill Thanks for reporting the issues.

1) The terra package handles very large rasters by saving them on disk chunk by chunk. It selects a temp location from your computer typically in drive C:\. It seems your drive C didn't have enough capacity which could have caused the issue. Could you try it once more but with the filename argument pointing to another drive with enough space? Check the help file for this argument. E.g.:

transRasts <- gdm.transform(model=gdmRastMod, data=envRast, filename="D:/somefolder/dgm_transform_layers.tif")

2) It is reasonable to get fewer transformed variables as output. Essentially, the gdm.transform will discard any covariate with a spline coefficient of 0, because it will be just a raster with 0 values for all pixels.

Please let me know if the issue with the weird raster values still persists.

Basquill commented 6 months ago

Hi @rvalavi . I'm setting this up to run on a high performance computing cluster. Will let you know how it goes.

rvalavi commented 6 months ago

Any luck? @Basquill

Basquill commented 5 months ago

Apologies for the delay @rvalavi . I did run this successfully on my laptop (1 time). After freeing up disk space (for memory caching), it took 5.5 hours. There weren't any apparent errors. I still want to try running it on an HPC. My HPC account had to be re-initiated and there were some other delays sending job scripts. Hope to get back to that soon. Will let you know how it goes. Thanks for checking in

rvalavi commented 5 months ago

All good! I’m glad that it worked, @Basquill! If the output maps look fine, you can close this issue.