Help with environmental predictor raster input

Friepei commented 3 weeks ago

Dear biomod2 team,

Context and question I have collected occurrence data from species between the years 2000-2020. I would like to use this data to model the present-day habitat suitability. For the environmental predictors (myExpl), so far, I have used single-band raster layers containing the averaged annual or monthly means of the environmental predictors over the time frame (e.g. BioOracle). However, now I would like to account for the interannual variability of the environmental predictors within the time frame 2000-2020, in particular the increase in sea surface temperature. Therefore my question is, can I build a model for the time frame 2000-2020, that connects the occurrence points for each year of the time frame with the value of the environmental predictor (annual mean) of the respective year? The final result should be one model for the whole time frame (myBiomodModelOut), not one model per year. I tried with a multilayer raster (one raster containing one band for each year), but with this approach each year became a separate predictor.

Thank you for your help.

Code used Please add the code used here to illustrate your question. Start with BIOMOD_FormatingData up to the function on which you have questions. Please add as well the output of show for the different object used or generated.

MayaGueguen commented 3 weeks ago

Hello there,

If I'm getting it right, you would like to use different occurrence points associated to different years between 2000 and 2020, and use for each of these points the climate (explanatory variables) corresponding to those different years, right ?

You can do that by giving a data.frame to the expl.var parameter of the BIOMOD_FormatingData function ! But it means that you have to build by yourself before a table containing your points coordinates (nd give them to resp.xy parameter), the related year, and extract from your variables raster the values of environment for each occurrence 👀

Note also that, if you are using pseudo-absences, you will have 2 solutions :

either you select by yourself pseudo-absences out of biomod2, and you include those points within the resp.var (1 for occurrences, and NA for pseudo-absences), expl.var and resp.xy and you use PA.strategy = 'user' and you give your pseudo-absences table to PA.user.table parameter
or you still would like to select PA through biomod2, and then you have to give for all points of your map except occurrences the coordinates and the environmental values

(_you can check for examples with PA.user.table here in tutorial_)

⚠️ Note that in that case, you have to choose from which year you extract environmental values for pseudo-absences...

Maya

Friepei commented 3 weeks ago

Thank you for the quick reply!

I tried by giving a data.frame to the expl.var parameter of the BIOMOD_FormatingData function ! I build a table containing my points coordinates (nd give them to resp.xy parameter), the related year, and extracted from my variables raster the values of environment for each occurrence!

However, as you point out, I could not run the model because it could not generate pseudoabsences.

"Either you select by yourself pseudo-absences out of biomod2, and you include those points within the resp.var (1 for occurrences, and NA for pseudo-absences), expl.var and resp.xy and you use PA.strategy = 'user' and you give your pseudo-absences table to PA.user.table parameter"

->Could I use the spsample(x, n, type, ...) function to generate random points in the study area and use them as pseudoabsences if I include them manually in the resp.var data frame?

And how would it work with projecting to the future?

Friepei commented 3 weeks ago

Thank you for the quick reply!

I tried by giving a data.frame to the expl.var parameter of the BIOMOD_FormatingData function ! I build a table containing my points coordinates (nd give them to resp.xy parameter), the related year, and extracted from my variables raster the values of environment for each occurrence!

However, as you point out, I could not run the model because it could not generate pseudoabsences.

"Either you select by yourself pseudo-absences out of biomod2, and you include those points within the resp.var (1 for occurrences, and NA for pseudo-absences), expl.var and resp.xy and you use PA.strategy = 'user' and you give your pseudo-absences table to PA.user.table parameter"

->Could I use the spsample(x, n, type, ...) function to generate random points in the study area and use them as pseudoabsences if I include them manually in the resp.var data frame?

And how would it work with projecting to the future?

Am Mi., 5. Juni 2024 um 15:41 Uhr schrieb MayaGueguen < @.***>:

Hello there,

If I'm getting it right, you would like to use different occurrence points associated to different years between 2000 and 2020, and use for each of these points the climate (explanatory variables) corresponding to those different years, right ?

You can do that by giving a data.frame to the expl.var parameter of the BIOMOD_FormatingData https://biomodhub.github.io/biomod2/reference/BIOMOD_FormatingData.html function ! But it means that you have to build by yourself before a table containing your points coordinates (nd give them to resp.xy parameter), the related year, and extract from your variables raster the values of environment for each occurrence 👀

Note also that, if you are using pseudo-absences, you will have 2 solutions :

either you select by yourself pseudo-absences out of biomod2, and you include those points within the resp.var (1 for occurrences, and NA for pseudo-absences), expl.var and resp.xy and you use PA.strategy = 'user' and you give your pseudo-absences table to PA.user.table parameter

or you still would like to select PA through biomod2, and then you have to give for all points of your map except occurrences the coordinates and the environmental values

(you can check for examples with PA.user.table here in tutorial https://biomodhub.github.io/biomod2/articles/examples_2_secundaryFunctions.html#generate-pseudo-absence-datasets )

⚠️ Note that in that case, you have to choose from which year you extract environmental values for pseudo-absences...

Maya

— Reply to this email directly, view it on GitHub https://github.com/biomodhub/biomod2/issues/470#issuecomment-2150228470, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5I5J4EF6SBIY4WDAAD2RHTZF4PRTAVCNFSM6AAAAABI2OLUS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQGIZDQNBXGA . You are receiving this because you authored the thread.Message ID: @.***>

MayaGueguen commented 3 weeks ago

Hello again,

Cool to see that we are making progress !

:warning: Just to be sure, as my first message could be misleading : be sure to keep only in expl.var the columns containing your variables (and not coordinates or sampling year for example)

Could I use the spsample(x, n, type, ...) function to generate random points in the study area and use them as pseudoabsences if I include them manually in the resp.var data frame?

Yes :slightly_smiling_face: Let's say that you have 50 occurrences and that you want to use 2 sets of 1000 pseudo-absences, then here is an example of what you should give to BIOMOD_FormatingData :

resp.var : a vector of length 1256 containing first 50 1 and then 1206 NA values (because there were points that were selected for both PA1 and PA2, and a few only for one of those 2 datasets)
resp.xy : a 2-columns matrix or data.frame containing 1256 lines, one for each point coordinates, in the same order than resp.var ! (so first the 50 occurrences, and then the 1206 PA points)
expl.var : a matrix or data.frame containing 1256 lines, one for each point with the corresponding environmental values (so for occurrences values are extracted from their sampling year, and for PA you have to decide from which year(s) you sample them)
PA.strategy = 'user.defined'
PA.user.table : a matrix or data.frame containing 1256 lines and 2 columns (PA1 and PA2), and containing TRUE or FALSE values : all 50 first points are always set to TRUE (your occurrence points), and then depending on which points were selected in which dataset (and you should have 1050 TRUE value in each column)

So if you use this strategy, it is up to you how you select the PA points and spsample is one option :slightly_smiling_face:

And if you want to use biomod2 to do the selection, the principle is the same, but you put as many points as you want in NA (with corresponding coordinates and environmental values), and the selection will be done among those points.

Maya

MayaGueguen commented 2 weeks ago

Hello Friederike :wave:

So, let's say for the example that you have 10 presences, and you selected 100 points randomly over your area. And you associated each point to their corresponding coordinates and environmental values. Which leads you to a table (let's call it obsTable) looking like that :

point	x	y	var1	var2
1	5	6	1	0.4
1	5	7	2	0.3
1	6	8	3	0.4
...	...	...	...	...
NA	7	11	11	0.9
NA	5	3	12	0.6
NA	6	6	13	0.8
...	...	...	...	...

Then your PA.table should contain the same number of rows as the previous table ( :warning: even if you finally don't plan on using all the points given, both table should match in terms of number of rows so points are associated with the right information) and as many columns as PA dataset you want. And for each dataset, you have to set to TRUE all presence points, and all NA points that you want to include within the PA dataset :

PA1	PA2
TRUE	TRUE
TRUE	TRUE
TRUE	TRUE
...	...
TRUE	FALSE
TRUE	TRUE
FALSE	FALSE
...	...

:information_source: Note that :

a same NA point can be used in different PA dataset.
there is an argument within BIOMOD_Modeling function called CV.do.full.models which will create a supplementary set taking all points that are used at some point in a PA dataset ( :warning: so if there a line that is always set to FALSE within your PA.table, it will not be included within the full model)

Finally, how it will look like when giving these arguments to the BIOMOD_FormatingData function :

myBiomodData <- BIOMOD_FormatingData(resp.var = obsTable[, "point"],
                                      expl.var = obsTable[, c("var1", "var2")],
                                      resp.xy = obsTable[, c("x", "y")],
                                      resp.name = myRespName,
                                      PA.strategy = 'user.defined',
                                      PA.user.table = PA.table)

Hope it helps, Maya

Friepei commented 2 weeks ago

Thank you, Maya! This helped a lot! I could manage to run the model. With this approach, I get predictions for the points which I feed into the model but not for any other points outside the dataframe, is that correct?

MayaGueguen commented 2 weeks ago

Yes, this is correct :slightly_smiling_face: If you want to get predictions over a specific date and area, you will have to use the BIOMOD_Projection (and BIOMOD_EnsembleForecasting) function after that !

biomodhub / biomod2

Help with environmental predictor raster input #470

point	x	y	var1	var2
1	5	6	1	0.4
1	5	7	2	0.3
1	6	8	3	0.4
...	...	...	...	...
NA	7	11	11	0.9
NA	5	3	12	0.6
NA	6	6	13	0.8
...	...	...	...	...

point	x	y	var1	var2
1	5	6	1	0.4
1	5	7	2	0.3
1	6	8	3	0.4
...	...	...	...	...
NA	7	11	11	0.9
NA	5	3	12	0.6
NA	6	6	13	0.8
...	...	...	...	...

point	x	y	var1	var2
1	5	6	1	0.4
1	5	7	2	0.3
1	6	8	3	0.4
...	...	...	...	...
NA	7	11	11	0.9
NA	5	3	12	0.6
NA	6	6	13	0.8
...	...	...	...	...