TheoreticalEcology / s-jSDM

Scalable joint species distribution modeling
https://cran.r-project.org/web/packages/sjSDM/index.html
GNU General Public License v3.0
68 stars 14 forks source link

Error when predicting to new data - error: Error : torch._C._LinAlgError: linalg.inv: The diagonal element 3 is zero #121

Closed agnes-duhamet closed 1 year ago

agnes-duhamet commented 1 year ago

Hi Maximilian, I would like to model marine fish species distribution in function of distance to the coast, depth, and marine habitat taking into account spatial autocorrelation and species interaction. I wrote the following function: env_var_scaled = env_var %>% mutate(dist_land = scale(dist_land), depth = scale(depth), habitat_principal = droplevels(habitat_principal)) model <- sjSDM(Y = Occ, env = linear(data = env_var_scaled, formula = ~dist_land+depth+habitat_principal), spatial = linear(data = SP %>% scale, formula = ~0+longitude_start_DD:latitude_start_DD), se = TRUE, family=binomial("probit"), sampling = 100L)

I would like now to predict fish species occurrence as function of the environmental variables. I have a spatial grid in which each cell have a value for all environmental variables. I would like to now the occurrence probability of each species in each cell. How can I made it?

p<- predict(model, newdata = env_var_scaled_grid, SP = SP_grid)

The problem is that I scaled variables in the model and I scale variables for each cell in a second time. I think that it's a problem because I will not have the same value for example for a depth of 10 meters in env_var_scaled and env_var_scaled_grid. I tried to scale all the variables at the same time (those of my data that I will use to build the model and those of the grid) but in that case I have this error: Error : torch._C._LinAlgError: linalg.inv: The diagonal element 3 is zero, the inversion could not be completed because the input matrix is singular.

How can I predict species occurrence without having problem with scaling? Thanks in advance, Agnès

florianhartig commented 1 year ago

Hi Agnès,

generally, you should scale the variables for prediction in the same way (i.e with the same values) as the variables you used for fitting, you should have no problem. One way to to this is to save the sd() and mean() arguments in the training data, scale by hand, and apply this also to the test data. Alternatively, the applied mean and sd arguments are also included in the output of the scale() function in R.

It is not clear to me if you do this, it sounds a bit as if you are doing separate scaling for training and predictions.

That, however, doesn't explain to me why you get this error - maybe Max can help there!

agnes-duhamet commented 1 year ago

Thanks. I have tried to merge the dataframe with training data with the dataframe with data for predictions. I applied then scale function for environmental variables and spatial coordinates. So, all was scaled at the same time. I then separate the two dataframes (training and prediction) and try to run the model and I get this error Error : torch._C._LinAlgError: linalg.inv: The diagonal element 3 is zero, the inversion could not be completed because the input matrix is singular

I don't know if it's a correct way to scale variables at the same time?

agnes-duhamet commented 1 year ago

I saved the sd() and mean() arguments in the training data, scale by hand, and apply this also to the test data as you recommended and it works. Thanks. Agnès

florianhartig commented 1 year ago

Hi,

OK, odd, it sounds to me as if you were doing it right in the first place, so probably there was some syntax error. OK, but if it works it works so I will close this.

Best F