TheoreticalEcology / s-jSDM

Scalable joint species distribution modeling
https://cran.r-project.org/web/packages/sjSDM/index.html
GNU General Public License v3.0
68 stars 14 forks source link

Random factors #135

Open fetafretka opened 8 months ago

fetafretka commented 8 months ago

Hi, this might be a dumb quastion, but is it possible to incloud random factors int the sjSDM model?

MaximilianPi commented 8 months ago

hi @fetafretka,

Not yet, we are still working on it. What kind of random factors would you like to include?

fetafretka commented 8 months ago

Im working on a hirerchial structured data set, where I have samples from the same sampling arya, but from 3 years, and the experiment was conducted in 5 different forests using the same treatments, so I wanted to make a model inclouding forests and years as random factors. I can sidestep the forest by just running 5 models, but the years are a bit of a problem for me

MaximilianPi commented 8 months ago

Okay, so how many sites do you have? If you have enough sites, you could run ~year*(env1+env2+...) + 0, which will give you independent slopes for env. Now, if you are mostly interested in the overall effects of environment (averaged over time), you could do a metaregression to do a weighted averaging of the env slope effects to get the grand mean (which is what a random effects model will give you for ~env1 + (env1|year)).

fetafretka commented 8 months ago

So I have 384 sampling sites, around 5k OTUs, 4 treatments (im using dummy variable for them, I dont really have the time to run a bayesian interface) and 3 years. That "~year*(env1+env2+...) + 0" aproach sounds interesting, ill try that, but why the 0 at the end?

MaximilianPi commented 8 months ago

The zero, because we do not want a reference level for year, actually also not for the env*year interactions, so it should rather be:

~year*(env1+env2+...) - 0 - env1 - env2-... or (which is equivalent to) ~0+year+year:env1+year:env2

fetafretka commented 8 months ago

So just to make sure i got this correct, if I have 3 years, say year 1 is considered the dummy variable, would I code it as ~year2(env1+env2) + year3(env1+env2) - 0 - 2env1 - 2evn2?

MaximilianPi commented 8 months ago

Hi, no: ~0+year+year:env1+year:env2, will create an interaction between year and your env variables, so that you get different env effects for each year.

fetafretka commented 8 months ago

okay thanks! On Wednesday, February 21, 2024 at 03:50:49 PM GMT+1, MaximilianPi @.***> wrote:

Hi, no: ~0+year+year:env1+year:env2, will create an interaction between year and your env variables, so that you get different env effects for each year.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

fetafretka commented 8 months ago

Okay so it will work like in other models, thank you very much!

fetafretka commented 8 months ago

Error in py_call_impl(callable, call_args$unnamed, call_args$named) : ValueError: Expected parameter rate (Tensor of shape (2, 38, 4983)) of distribution Poisson(rate: torch.Size([2, 38, 4983])) to satisfy the constraint GreaterThanEq(lower_bound=0.0), but found invalid values: tensor([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]],

    [[nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     ...,
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan]]], grad_fn=<ExpBackward0>)

Run reticulate::py_last_error() for details. Timing stopped at: 0.68 0.08 0.19

Im sorry but im having more problems, im getting this error when im trying to add a location, simple X Y coordinates, call is: model <- sjSDM(Y = spec_mat, env = linear(data = factors, formula = ~ N + pH + C + C/N + percipitation + Heating + Bio + Fert),se = TRUE, family = poisson("log"),sampling = 2L,learning_rate = 0.001)

Im writing here because I dont want to spam too much @MaximilianPi

MaximilianPi commented 8 months ago

Hi @fetafretka, Have you scaled your coordinates, or the other variables? Optimization can go wrong if variables are on large scales, thus I would suggest to scale all of them.

fetafretka commented 7 months ago

Hei It's me again, with another problem, i managed to run the program without the year parameter, it looked very good and gave me a lot of interesting outcomes, but now im trying to run:

model <- sjSDM(Y = spec_mat, env = linear(data = factors, formula = ~ 0 + year + year:N + year:pH + year:C + year:C/N + year:percipitation + year:Heating + year:Bio + year:Fert),se = TRUE, family = poisson("log"),sampling = 100L,learning_rate = 0.001)

I'v coded the year parameter as having values "A","B","C". It gave me this error agian: Iter: 0/100 0%| | [00:00, ?it/s] Iter: 0/100 0%| | [00:08, ?it/s] Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: Expected parameter rate (Tensor of shape (100, 38, 4983)) of distribution Poisson(rate: torch.Size([100, 38, 4983])) to satisfy the constraint GreaterThanEq(lower_bound=0.0), but found invalid values: <... omitted ...>an, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]],

    [[nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     ...,
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan]],

    [[nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     ...,
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan]]], grad_fn=<ExpBackward0>)

See reticulate::py_last_error() for details Calls: sjSDM -> system.time -> -> py_call_impl Timing stopped at: 7.241 1.039 8.832 Execution halted

So I tried it with coding it 0,1,2 and the same thing happens: Iter: 0/100 0%| | [00:00, ?it/s] Iter: 0/100 0%| | [00:08, ?it/s] Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: Expected parameter rate (Tensor of shape (100, 38, 4983)) of distribution Poisson(rate: torch.Size([100, 38, 4983])) to satisfy the constraint GreaterThanEq(lower_bound=0.0), but found invalid values: <... omitted ...>an, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]],

    [[nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     ...,
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan]],

    [[nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     ...,
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan]]], grad_fn=<ExpBackward0>)

See reticulate::py_last_error() for details Calls: sjSDM -> system.time -> -> py_call_impl Timing stopped at: 7.241 1.039 8.832 Execution halted

all my other variables are etiher 0/1 to have a dummy variable system, or transformed into a scale of between 0-1 when they were continious

MaximilianPi commented 7 months ago

hi @fetafretka,

Sorry, it is hard to say if it is because of the data or because of convergence problems in the model. If this is ok for you, could you please send me your code+data so I can take a look myself?

fetafretka commented 7 months ago

No problem, because im fixing one error and running into another the code is: library(sjSDM) factors <- read.csv2("factors.csv") spec <- read.csv2("OTU_table_no_reps_spec.csv") rownames(spec) <- spec$X spec <- spec[, -1] spec_mat <- as.matrix(spec) colnames(factors)<- c("percipitation","Heating","Bio","Fert","year","dry_weight","pH","N","C","C/N") print(factors) model <- sjSDM(Y = spec_mat, env = linear(data = factors, formula = ~ 0 + year + year:N + year:pH + year:C + year:C/N + year:percipitation + year:Heating + year:Bio + year:Fert),se = TRUE, family = poisson("log"),sampling = 100L,learning_rate = 0.001) summ <- summary(model) sink(file = "sjSDM_output.txt") summary(model) sink(file = NULL) pdf(file="test_sjsdm.pdf") plot(model) dev.off()

Im running it on a slurm script on an outside server with the following code:

!/bin/bash

SBATCH --account=nn9338k

SBATCH --job-name=sjSDM

SBATCH --time=5-0:0:0

SBATCH --partition=accel --gpus=3

SBATCH --mem-per-gpu=8G

SBATCH --ntasks=1

- all hail the god of statistics

Set up job environment:

set -o errexit # Exit the script on any error set -o nounset # Treat any unset variables as an error module purge module load Anaconda3/2022.10 module load R/4.2.1-foss-2022a OTU_table_no_reps_spec.csv

R < sjSDM.R --save OTU_table_no_reps_spec.csv

OTU table is the species matrix and factors are the enviormental factors after zero-skewed transformation, what im running now

fetafretka commented 7 months ago

Im very sorry, this seems to be an never ending story, i got through, i had to make my dataset smaller because of the amount of memmory it used up, now im getting: torch._C._LinAlgError: linalg.inv: The diagonal element 3 is zero, the inversion could not be completed because the input matrix is singular. Unfortenetly i have no experience with torch and google isent helping much

MaximilianPi commented 7 months ago

Hi @fetafretka , Sorry I somehow missed your post. I will look at your data asap.

fetafretka commented 7 months ago

Please dont appolagise, you have been an extream help so far

MaximilianPi commented 7 months ago

Could you please send me the factors.csv via email? The link above doesn't work maximilian.pichler@ur.de

fetafretka commented 6 months ago

hei, could i bother you with one more quastion?

On Tuesday, April 9, 2024 at 09:39:06 AM GMT+2, MaximilianPi ***@***.***> wrote:  

Could you please send me the factors.csv via email? The link above doesn't work @.***

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>