brshipley / megaSDM

Other
22 stars 3 forks source link

Potential issue with TrainStudyEnv function? (and other errors) #2

Open yucheols opened 2 years ago

yucheols commented 2 years ago

Hello!

I've been trying out the package for the past couple of months since seeing it out. I first followed the vignette and then applied it to my own dataset. I'm loving how smooth everything runs and how nicely the workflow is organized!

Since trying out the package, I've come across a somewhat subtle error in TrainStudyEnv function.

I've been trying to model species distributions across the Korean Peninsula. When I set the extent around the Peninsula, I get all the rasters in matching extents, but I'm getting very small differences in resolution, such that R cannot detect it with compareRaster function (all comparisons returned TRUE), but MaxEnt obviously can. It spat out an error message when run with the output rasters from TrainStudyEnv ("different xy resolution not supported").

When I checked the rasters in GIS softwares, I found this: Pixel Size : 0.008333333333334169354,-0.008333333333333270768

So there are small differences in resolution at 16th decimal places or so. Could you help me troubleshoot how to get around this issue??

Also, I encountered another issue that has been quite a pain to troubleshoot. So I eventually fixed the differences in resolution with a different package (foster) and managed to run the models successfully (to the MaxEntModel step).

But when I move onto the projection step, the program runs till the median ensemble step and then abruptly stops with the following warning: Error in crs != "" : comparison (2) is possible only for atomic and list types

I assume it has to do with designated crs, although I'm not sure why (as all layers were already projected to WGS84 before modeling). So I was not able to troubleshoot this error message.

I'd very much appreciate your help troubleshooting these, and I'd be happy to share all the input data and code as well.

Thanks for reading this rather long question, and hope to hear from you soon! All the best, Yucheol Shin

brshipley commented 2 years ago

Hello!

Thank you so much for trying out the package and bringing these issues to my attention!

I've changed the TrainStudyEnv function a bit to address the first issue (the rectangular pixels). Now, there is a new argument (maxentproj), a logical TRUE/FALSE argument describing whether or not MaxEnt will be used on the data generated. If it's TRUE (the default), the function forces the resampling of the input rasters into a square projection (using the maximum resolution of the original data, in this case 0.008333333333334169354). In addition, I've added warning messages that the rectangular pixels will need to be resampled before being used in MaxEnt. This should fix the problem, especially because the TrainStudyEnv function first resamples and reprojects the data and only then clips it to the desired extent. It's also possible that it may be a slight error within the raster::crop or raster::projectRaster functions, although I've never had that exact issue before.

As for the second error you are running into, I can't seem to replicate it. My guess is that it's related to running the data through the foster package and possibly losing the crs argument along the way, but I'm unsure. This error usually comes up when trying to compare formatted coordinate reference systems (for example, crs(Raster1) != crs(Raster2)), although that doesn't seem to be the case in the medianensemble function. If you're still having this problem once the first issue is fixed, I'd love to see the data and code so we can figure it out!

Thanks again, Ben Shipley

yucheols commented 2 years ago

Hi Ben,

Thank you for the quick response and also adding a new argument to the function! I've tried out the modified function at different raster resolutions and found something rather curious.

So here's what I've found: 1) There are still differences in raster resolutions after running the raster files through the changed function, whether or not the maxentproj argument is turned on (=T) or off (=F).

2) Curiously, this problem seems to be happening only at 1-km spatial resolution. I ran the code again with rasters at 2.5km resolution and the the output rasters didn't have any differences in resolution.

3) As you've mentioned, the second error I ran into initially seems to be related to the foster package messing up the crs argument. I reassigned the proper crs argument to the files output from foster in QGIS and the MaxEntProj doesn't crash anymore at the median ensemble stage.

So with the second error seemingly solved, my main question now is why the differences in resolution is still created after running through the modified code, and why is it happening only at 1-km resolution? I never had such a problem when using the raster::crop() and raster::projectRaster() functions on their own, so I'm very curious to know what is going on here!

And there's another unrelated, but a minor issue that I ran into running the MaxEntProj function (but not when running the MaxEntModel function). When I set ncores = 2, I sometimes get the following error:

Error in checkForRemoteErrors(val) : one node produced an error: comparison (2) is possible only for atomic and list types

From checkForRemoteErrors(val), I thought it may have something to do with parallel process, so I set the ncores = 1, and the error was gone. But this error did not happen when running through the example data provided with the vignette. It only happened with my own datasets (a dataset for 2 species and another for 24 species). So I'd love hear your thoughts on this as well!

Sorry to flood you with questions, but I've got to say that this package is the most straightforward and brilliant SDM package I've ever tried in R. It is really, really impressive! I was very much impressed by the speed at which it could process multiple species. And being able to internally account for dispersal ability and being able to create richness map and time map, etc. are a massive plus! This is an amazing package and I'd love to see how it will develop in the future!

All the best, Yucheol

brshipley commented 2 years ago

Hi Yucheol,

Wow, that's really interesting! It's so strange that the resolution issues are only present for 1-km resolution. Would you mind linking your raster data and code here so that I can work with it? I've been trying out different rasters with rectangular pixels but they've all worked for me (possibly because I generated the rasters in R using the raster function). Yeah, I have never had that problem running those two functions either, so something else must be going on!

For the new error, you already have the .lambdas files MaxEnt gives and you just want to project them, right? I wonder if one of the .lambdas files is corrupted or didn't provide a useable output. The fact that only one node (as opposed to both of the nodes) failed means it's something specific about that particular species or run, not an issue that appears in all of the species. If you'd like, I can take a look at your files for that as well to try to figure it out! I'm glad it works out when you aren't using parallelization though.

Thank you so much!! Although the parallelization process used by megaSDM can be somewhat rough at times, it really does make it so much faster to do species distribution modelling!

Best, Ben

yucheols commented 2 years ago

Hi Ben,

Thank you so much for your time looking into these issues! I'd be very grateful if you could look at the data and code I've set up! Thank you for offering to help! But would you mind me sharing the data by email? I think sharing will be easier that way :)

And yeah, I just want to project the models and there are already .lambdas files generated from running MaxEntModel() function. I opened up several of those files but couldn't find any obvious issues. So I'm very curious to know what is going on!

Regarding parallelization, setting ncores = 2 for the "2 species project" worked initially (and that was the setting I used to get my outputs), but it put out error when I ran it again. Setting ncores = 1 produced no errors. For the "24 species project", setting ncores = 2 does not work at all. It only works with ncores set to 1. After that I get the crs error during the median ensemble stage.

Also, I ran the code again yesterday, and I've mentioned that I have solved the "Error in crs != " issue. But apparently, I said it too quickly! After "fixing" the CRS with QGIS, the code ran without error at the median ensemble stage for the first four species (at first it crashed at the first species), then it stopped with the same crs error at the fifth species. At this point I was pretty confused. So I ran the code again, thinking it may be a random glitch. But then the code started putting out the same crs error at the first species! This happened again and again even after I "re-fixed" the CRS in QGIS. So the way the error is behaving is not consistent, which makes it even more confusing.

Oh, and I want to point out that none of these issues happened when running each species individually in MaxEnt GUI, although for some species there are some values missing from some of the environmental variables. This warning could be turned off manually in MaxEnt GUI, but could this be the cause of issues I'm having? I do get the following errors that seem to be related to this problem, although I'm not sure if they are what causing the code to stop: 1: In threshold(spp.name, modern.rasters, nrep, "", currentYear) : NAs introduced by coercion 2: In if (!is.na(aucval)) { ... : the condition has length > 1 and only the first element will be used 3: In if (auc >= aucval) { ... : the condition has length > 1 and only the first element will be used (50+ warnings that are basically repetitive)

Right now I'm setting up code to run the same process, but with 2.5-km rasters. Trying to isolate the issue as much as possible! And I'll send the data and code your way as soon as I receive your email address :)

Thanks again! Yucheol

brshipley commented 2 years ago

Hi Yucheol,

You're absolutely right, sending by email would be much easier! My email address is bshipley6@gatech.edu.

Yeah, that behavior is very confusing! I wonder if the re-running of the code leads to some of the unexpected errors, especially with the first parallelization error (e.g., maybe megaSDM is reading some of the files generated in a previous run that weren't deleted). I'll take a look at it once I have your data, and see if I can replicate each of those errors.

I know that the 2nd and 3rd warning messages ("condition has length > 1) have to do with the auc threshold you can optionally put in place to weed out bad replicates of the model. I recently changed the code so that a different auc threshold could be selected for each species, but I forgot to include that in the threshold function. I believe I've fixed it now, but I'm going to wait until I've run through your data completely to push it to GitHub :)

Thanks for your patience on this, and I hope to hear from you soon, Ben

yucheols commented 2 years ago

Hi Ben,

For the past couple of days I've been trying out several more ways to really narrow down the issue as much as possible.

I tried bypassing the TrainStudyEnv altogether by using a RasterStack of layers masked to the boundaries of the Korean Peninsula (so the rasters already had matching extent and resolution). But it still ends up with the same CRS error. So it seems possible that there could be an issue with applying a mask layer with rgdal prior to inputting the rasters to TrainStudyEnv and downstream functions.....

But give me a couple of days or so before I clarify that issue myself and then I'll send the code and data your way!

Thank you again for your help and input! Yucheol

yucheols commented 2 years ago

Hi Ben, I emailed you the code and dataset! I ran the code from top to bottom, and you should be able to reproduce the error I've been encountering. There is also a Word file detailing the things I've tested with the code & data.

Thank you so much for your help, and I will look forward to hearing from you! Best, Yucheol