DaniGargya / dissertation

All work and code associated with my honors dissertation thesis
MIT License
0 stars 0 forks source link

Model failing to run #6

Closed DaniGargya closed 4 years ago

DaniGargya commented 4 years ago

Hey @IslaMS and @gndaskalova ,

I am having problems to run my model, because straight after I try to run it, I get this generic error:

Compiling the C++ model
recompiling to avoid crashing R session
Start sampling
Error in unserialize(socklist[[n]]) : error reading from connection
Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection

after having run the model below yesterday without it compiling, (but also without error)

mo_tu5 <- brm(bf(Jtu ~ scaleacc_25*scalehpd_25 + duration_plot +
                   (1|TAXA) + (1|STUDY_ID), coi ~ 1, zoi ~ 1),
              family = zero_one_inflated_beta(), 
              data = data1,
              iter = 4000,
              warmup = 1000,
              inits = '0',
              control = list(adapt_delta = 0.85),
              cores = 2, chains = 2)

I wanted to run this model just now:

mo_tu6 <- brm(bf(Jtu ~ scaleacc_25*scalehpd_25 + duration_plot + TAXA +
                  (1|STUDY_ID), coi ~ 1, zoi ~ 1),
              family = zero_one_inflated_beta(), 
              data = data1,
              iter = 4000,
              warmup = 1000,
              inits = '0',
              control = list(adapt_delta = 0.85),
              cores = 2, chains = 2)

but I can't because of the error.

Can you see somewhere why that might be or what I can do differently?

Thanks, Dani

IslaMS commented 4 years ago

Hi @DaniGargya,

Can you update the script so that the model is in it and the data file can load from a relative file path and push that and then I can try running that model on my end to trouble shoot.

Isla

DaniGargya commented 4 years ago

Hey @IslaMS ,

thanks for the quick response!! I just pushed all you need.

This is the script: https://github.com/DaniGargya/dissertation/blob/master/scripts/data_analysis.R

Dani

IslaMS commented 4 years ago

Hi @DaniGargya,

I meant to run the model over night, but I forgot. I am running things now and saving the model outputs to the outputs folder so you can check them out. I ran a slightly simplified model just to check things were working fine first and that seemed to go well - no errors or warnings! I just pushed that model output. And now I am running pretty much the model above with a slight simplification. Then I will run the model above. I am also saving these models in the script. With the outputs all saved, you should be able to work with those, and may not have to run things on your end. I also find with Stan models like brms produces it can be helpful to do a full shutdown of the computer and restart if weird things start happening, because I think things can go wrong with the C++ compiler that don't resolve unless you do a restart of at least R and RStudio if not your computer as a whole.

Isla

DaniGargya commented 4 years ago

Hey @IslaMS ,

thank you very very much for your help!! I am really curious for the outputs of the model that is running now. I will definitely try and see whether shutting down the computer helps for running the other models!

Have a good Sunday, Dani

IslaMS commented 4 years ago

Hi @DaniGargya,

This model (see below) is 25% finished on my computer at the moment. I think it has been running for 2 hours. So I might not have the model outputs for you today, but hopefully by tomorrow morning?

Isla

mo_tu_simp2 <- brm(bf(Jtu ~ scaleacc_25*scalehpd_25 + duration_plot + TAXA + (1|STUDY_ID)), family = zero_one_inflated_beta(), data = data1, iter = 4000, warmup = 1000, inits = '0', control = list(adapt_delta = 0.85), cores = 4, chains = 4)

DaniGargya commented 4 years ago

That's completely fine! Thanks again for running it :)

IslaMS commented 4 years ago

Hi @DaniGargya,

I have run the more complex model and saved the outputs. The chains did not mix well - so the outputs do not look great and there were a fair few warnings. I think we have to simplify the model. The main simplification that seems to work is to not have an interaction. That model (see below) converged just fine when I ran it (and see outputs in the output folder). We could potentially run an interaction only model as a separate model. Let me know what model you want me to run next and I can get it going in the background here.

Isla

mo_tu_simp1 <- brm(bf(Jtu ~ scaleacc_25 + scalehpd_25 + duration_plot + TAXA + (1|STUDY_ID)), family = zero_one_inflated_beta(), data = data1, iter = 4000, warmup = 1000, inits = '0', control = list(adapt_delta = 0.85), cores = 4, chains = 4)

DaniGargya commented 4 years ago

Hey @IslaMS ,

that's unfortunate that the model is too complex with the interaction!

If you don't mind could you please run the simplified interaction model below? I'm not too sure whether to keep the random study ID effect or leave that one out as well.

mo_tu_simp3 <- brm(bf(Jtu ~ scaleacc_25*scalehpd_25 + (1|STUDY_ID)), family = zero_one_inflated_beta(), data = data1, iter = 4000, warmup = 1000, inits = '0', control = list(adapt_delta = 0.85), cores = 4, chains = 4)

I've already added it to the analysis script.

I am also going to try and run models again on my end tonight. Last night R did hang itself up though.. Let's hope I can do different sensitivity analysis with the models without interaction!

Have a good day! Dani

IslaMS commented 4 years ago

Hey @DaniGargya,

That model is now running. I also renamed some of the model output files to keep them consistent with our model names.

Isla

DaniGargya commented 4 years ago

Good morning @IslaMS !

Did the model with the simple interaction run? Could you share the model outputs please?

Dani

IslaMS commented 4 years ago

Hey @DaniGargya,

Sadly the model did not converge. I pushed the outputs so you can check them out. I think that the interaction is too difficult for the model to estimate. I am currently running this model (below) to check that it isn't the StudyID that is the problem. I think it would be worth you working with the mo_tu_simp1 outputs for now as all of your fixed effects are in there apart from the interaction. I would not recommend removing all of the random effects, because then you haven't controlled at all for the structure of your data and the results would not be reliable for that reason. The interaction models take about 20 hours to run on my computer. I might try to get this model running on SuperShrub this morning, so that you get the results a bit sooner.

Isla

mo_tu_simp4 <- brm(bf(Jtu ~ scaleacc_25*scalehpd_25 + (1|cell)), family = zero_one_inflated_beta(), data = data1, iter = 4000, warmup = 1000, inits = '0', control = list(adapt_delta = 0.85), cores = 4, chains = 4)

DaniGargya commented 4 years ago

Hey @IslaMS

Alright I will work with the simple model for now then. I ran a simple model last night on my computer and it worked so thats good news!

Just now I discovered that I messed part of my data transformation up: I scaled hpd the wrong way round (I accidentally copied the code which I used for accessibility 1 - (scaling)), but I fixed it now and pushed the new dataset where scalehpd_25 should be correct now!

Sorry for the mess! Dani

IslaMS commented 4 years ago

Hey @DaniGargya,

Remember you don't need to run the models on your computer if I have already run them because you can just load the model outputs that I have been saving and work with those. My model runs have four chains so they are slightly better versions of the model runs. Do you know how to load the model outputs or do you want me to explain that to you? Would you like me to re-run the simple full model without interaction now? Should we explore putting the grid random effect back in now that we have potentially dropped the interaction?

Isla

DaniGargya commented 4 years ago

Hey @IslaMS

I am already using the model outputs from you to create my figures! It's just that I planned to run different sensitivity analysis, eg using different scales of extraction for hpd and accessibility and that's why I am running them on my end. If I also run them with 4 chains that should be comparable results shouldn't it?

I agree with trying to put the grid cell random effect back in!

So, yes I would really appreciate if you could try and run the interaction one with the grid random effect on super shrub and a simple model with both grid and study Id random (and the updated data) on your computer if you don't need to use R?

Thanks so much for your help! Dani

IslaMS commented 4 years ago

Hi @DaniGargya,

Sounds good. I just wanted to confirm. What models do you want me to run over this evening? I have both SuperShrub and my own computer running now. If you tell me the models, I can get them running. I guess we have to re-run everything because of the scaling issue?

Isla

DaniGargya commented 4 years ago

Hey @IslaMS ,

do you have any results for the scaleacc*scalehpd + (1|cell) or

scaleacc_25 + scalehpd_25 + duration_plot + TAXA +
      (1|cell) + (1|STUDY_ID))

yet?

I guess depending whether the second model works with 2 random effects this will change how we will run the rest of the models?

In terms of rerunning, I think it would luckily just be mo_tu_simp1, but again if the second random effect is added, then there is no need for that.

I added some sensitivity analysis models to analysis script. For the one with the scale you will need to load in data1 again as I pushed the new variables for that.

One for only plants (as they are the only taxa that covers a range of accessibility scores)

mo_tu_simp5 <- brm(bf(Jtu ~ scaleacc_25 + scalehpd_25 + duration_plot + TAXA +
                        (1|STUDY_ID)), # or with (1|cell)?
                   family = zero_one_inflated_beta(), 
                   data = data1_plants,
                   iter = 4000,
                   warmup = 1000,
                   inits = '0',
                   control = list(adapt_delta = 0.85),
                   cores = 4, chains = 4)

sensitvity analysis with 1km extraction -> there will be loads of NAs so I'm not sure whether it is worth to run the model?

mo_tu_simp6 <- brm(bf(Jtu ~ scaleacc_1 + scalehpd_1 + duration_plot + TAXA +
                        (1|STUDY_ID)), # or with (1|cell)?
                   family = zero_one_inflated_beta(), 
                   data = data1,
                   iter = 4000,
                   warmup = 1000,
                   inits = '0',
                   control = list(adapt_delta = 0.85),
                   cores = 4, chains = 4)

sensitivity analysis with 50km extraction

mo_tu_simp7 <- brm(bf(Jtu ~ scaleacc_50 + scalehpd_50 + duration_plot + TAXA +
                        (1|STUDY_ID)), # or with (1|cell)?
                   family = zero_one_inflated_beta(), 
                   data = data1,
                   iter = 4000,
                   warmup = 1000,
                   inits = '0',
                   control = list(adapt_delta = 0.85),
                   cores = 4, chains = 4)

I hope this makes sense to you?

Dani

IslaMS commented 4 years ago

Hi @DaniGargya,

I can't get SuperShrub to work as RTools will not install. So I am stuck with my computer for now. I have run and saved the outputs for:

mo_tu_simp1 <- brm(bf(Jtu ~ scaleacc_25 + scalehpd_25 + duration_plot + TAXA + (1|STUDY_ID)), family = zero_one_inflated_beta(), data = data1, iter = 4000, warmup = 1000, inits = '0', control = list(adapt_delta = 0.85), cores = 4, chains = 4)

Next I will run:

mo_tu_simp4 <- brm(bf(Jtu ~ scaleacc_25*scalehpd_25 + (1|cell)), family = zero_one_inflated_beta(), data = data1, iter = 4000, warmup = 1000, inits = '0', control = list(adapt_delta = 0.85), cores = 4, chains = 4)

Unless you would prefer a different model. Then I can go back and try two random effects for the first model and then the sensitivity models - can you prioritize those for me. I am trying to make sure you have some outputs to work with so that you can begin your writing. Each model will take about 10 hours to run, so there is no rush on choosing the next model as I have the simp4 model running now.

Isla

gndaskalova commented 4 years ago

Hi @DaniGargya and @IslaMS

Nice to see all the modeling action! I would have chipped into the model running if I had enough internet to upload the model files later, sorry that I can't help out.

Good luck with the models! Gergana

DaniGargya commented 4 years ago

Hey @IslaMS

let me know when you are done running the model with both random effects please. Till then I might have run the rest of the sensitivity analysis myself.

@gndaskalova thanks for the good wishes :)

Dani

IslaMS commented 4 years ago

Hi @DaniGargya,

The simp4 model (see code below) is 65% done and was started at 9am this morning. Can you post the model code that you want me to run next as a message just to confirm? I will run that one over night assuming simp4 finishes before then.

Isla

mo_tu_simp4 <- brm(bf(Jtu ~ scaleacc_25*scalehpd_25 + (1|cell)), family = zero_one_inflated_beta(), data = data1, iter = 4000, warmup = 1000, inits = '0', control = list(adapt_delta = 0.85), cores = 4, chains = 4)

DaniGargya commented 4 years ago

Hey @IslaMS ,

that's the one that you could run, if the other finishes. (It's already on the data analysis script)

mo_tu_simp6 <- brm(bf(Jtu ~ scaleacc_25 + scalehpd_25 + duration_plot + TAXA + (1|cell) + (1|STUDY_ID)), family = zero_one_inflated_beta(), data = data1, iter = 4000, warmup = 1000, inits = '0', control = list(adapt_delta = 0.85), cores = 4, chains = 4)

IslaMS commented 4 years ago

Hey @DaniGargya, Cool. I just didn't want to run the wrong one!
Thanks! Isla

IslaMS commented 4 years ago

Hi @DaniGargya,

The model mo_tu_simp4 finished - it did not converge well. The outputs are too large to push to your repo, so here is a google Drive link to them. You can keep the file in your repo, but just list it in the git ignore so that you don't push it yourself and run into issues! I have started running model mo_tu_simp6 now.

https://drive.google.com/drive/folders/1w9if0zovBaKWp806IZ2pkXM5v4CachtY?usp=sharing

Isla

IslaMS commented 4 years ago

Hi @DaniGargya,

The model mo_tu_simp6 ran. No errors or warnings! I have uploaded the outputs to the google drive folder above. What is the next model you would like me to run?

Isla

DaniGargya commented 4 years ago

Hey @IslaMS,

I guess that means that models with an interaction just don't converge in my case.. So, do I just leave this question out then (or mention it in the methods that it didn't converge and not discuss it after)? Maybe I could briefly talk about turnover~hpd, but the CI overlap zero for that I think..

For the model with 2 random effects: does this mean I should use that one for presenting my results? It does account more for spatial autocorrelation. However, I was also wondering how do I account for the random effects when visualising and presenting my results? With ggpredict, I don't think I can really take them into account.. I've also tried the add_predicted_draws function that Gergana used, but it doesn't work with models that have more than 1 fixed effect. If the model with 2 random effects is "better" does this mean I should also run all the sensitivity analysis with 2 random effects?

For now this would be the last one to run, depends on the answer to the question above with 1 or 2 random effects:

mo_tu_simp8 <- brm(bf(Jtu ~ scaleacc_1 + scalehpd_1 + duration_plot + TAXA + (1|STUDY_ID)), # or with (1|cell)? family = zero_one_inflated_beta(), data = data1, iter = 4000, warmup = 1000, inits = '0', control = list(adapt_delta = 0.85), cores = 4, chains = 4)

Dani

IslaMS commented 4 years ago

Hi @DaniGargya,

Do you want to chat through these decisions? I will be available on Skype this afternoon.

Isla

DaniGargya commented 4 years ago

Hey @IslaMS ,

if you are still up, could you run this model please:

mo_tu_simp8 <- brm(bf(Jtu ~ scaleacc_1 + scalehpd_1 + duration_plot + TAXA + (1|cell) + (1|STUDY_ID)), family = zero_one_inflated_beta(), data = data1, iter = 4000, warmup = 1000, inits = '0', control = list(adapt_delta = 0.85), cores = 4, chains = 4)

i just pushed the script.

Have a good night! Dani

IslaMS commented 4 years ago

Hi @DaniGargya,

I am still up, though about to go to bed. I did chat to Gergana earlier. She says that tidybayse at least should do the logit transformation automatically, she wasn't sure about ggpredicts, and she provided some tidybayse code and I adapted it for your model in your code (line 213) and it here is the result:

image

So I guess what this means is that your ggpredicts code was probably working okay, and that your relationship is driven by random effect things like certain studies really driving the overall relationship with really high accessibility? and there is a lot of uncertainty around the main effect at lower values of accessibility?

@gndaskalova also said that it would be possible to calculate the zoi and coi elements of the relationship, but that it was a difficult or confusing calculation, then we got distracted by other topics in our chat. So perhaps Gergana you could advise further?

I'll get that model running now here.

Isla

gndaskalova commented 4 years ago

Hey @DaniGargya and @IslaMS

I think the current model structure assumes that the effect of accessibility is the same for turnover 0, 1 and everything in between. See how in the outputs you don't get separate intercepts or separate slopes. The estimates for phi, coi and zoi at the bottom of the table I think refer to just turnover, not the effect of accessibility on turnover, i.e., for example the probability that turnover is zero.

I was trying to find more details and there is a bit here https://bookdown.org/ajkurz/Statistical_Rethinking_recoded/monsters-and-mixtures.html#zero-inflated-outcomes

and this is a new tutorial on tidybayes https://www.stats.bris.ac.uk/R/web/packages/tidybayes/vignettes/tidybayes.html

So I think, but I could be wrong, that your current models account for the distribution of the data (turnover in all its peculiarities), but it doesn't assume different effects on accessibility for turnover 0, 1 and in between, thus you can't extract that info from the model output...

DaniGargya commented 4 years ago

Hey @IslaMS and @gndaskalova ,

thanks for the input and making the tidybayes predict work! It's reassuring that at least the overall relationship seems to be the same as with ggpredict.

I tried to adjust the tidybayes code for predicting taxa and I also get similar results as with ggpredict. However, I struggle to bring together the model outputs and the graph.

1) why do 95% intervals overlap the whole graph? (same is true for the graph for accessiblity)

2) from the mo_tu_simp6 model output it seems like terrestrial plants' effect size is zero and the CI are also overlapping zero. I guess same is true for birds when taking them as the intercept. Does this mean only mammals and terrestrial invertebrates have significant effects and we cannot say anything about birds and plants based on the visualisation?

taxa_predict_add

IslaMS commented 4 years ago

Hi @DaniGargya,

I have a meeting coming up here shortly, but quickly here, I just wanted to say that I uploaded the mo_tu_simp8 output to the Google Drive folder. Let me know if you want me to run another model.

I think your figure above and the summary table suggest that mammals and invertebrates have higher amounts of turnover relative to birds and plants and that invertebrates have the highest turnover. Because there are no random slopes, maybe just plotting the taxa intercepts as a bar graph with error or something similar would be sufficient. They all have to have the same relationship with accessibility because of how we set up the model - random intercepts only. Now that we have dropped the interaction, I guess we could try adding the random slopes back in? That might be interesting. What do you think?

Are there any other road blocks to your progress at the moment? Or are you good to keep analysing and writing.

Isla

DaniGargya commented 4 years ago

Hey @IslaMS ,

makes sense what you are saying! Here is the next model to run then:

mo_tu_simp9 <- brm(bf(Jtu ~ scaleacc_25 + scalehpd_25 + duration_plot + (scaleacc_25|TAXA) + (1|cell) + (1|STUDY_ID)), family = zero_one_inflated_beta(), data = data1, iter = 4000, warmup = 1000, inits = '0', control = list(adapt_delta = 0.85), cores = 4, chains = 4)

For now I am good to keep analysing and writing :)

Good night! Dani

IslaMS commented 4 years ago

Hi @DaniGargya,

I just started mo_tu_simp9 so that should run over night here.

Isla

IslaMS commented 4 years ago

Hi @DaniGargya,

Sigh. The mo_tu_simp9 did not converge well. I have uploaded the outputs to the Google Drive folder. Let me know if you want me to run another model.

Isla

DaniGargya commented 4 years ago

Hey @IslaMS ,

oh no I guess then I'm stuck with analyzing the relationship between turnover and taxa only.

Could you please run this model with richness change? (at this point I'm not sure I will include it but might still be interesting to see!)

If you could check that richness_change has the right data structure before running it please. When writing the csv file it seems to lose that information sometimes.

mo_tu_simp_ri <- brm(bf(richness_change ~ scaleacc_25 + scalehpd_25 + duration_plot + TAXA + (1|cell) + (1|STUDY_ID)), family = zero_one_inflated_beta(), data = data1, iter = 4000, warmup = 1000, inits = '0', control = list(adapt_delta = 0.85), cores = 4, chains = 4)

Thanks! Dani

IslaMS commented 4 years ago

Hi @DaniGargya,

I am currently running a richness change model. I don't think you want or can run a zero_one_inflated_beta() for richness change, as richness change can be positive or negative and can go above one and below zero. I changed over to a Gaussian model, but let me know if you want to do something else - technically richness data are count data, but once you go to richness change, then I think Gaussian is an okay approximation. See code for the current version of the model.

I am not sure exactly what you mean about losing information when writing to csv, though if there were an issue then you can always use .RData files. I have never had a problem like that though. The richness_change column looked like reasonable numbers, but you should check that to know for sure if everything makes sense.

Isla

DaniGargya commented 4 years ago

Hello @IslaMS ,

Ups, makes sense to run it with a gausian model!

I don't think I explained myself well with my previous comment about the data structure. But I think in the end, it should be fine.

Also, I think for now I am done with all the model running :) Thanks for all the running on your side!!

Hope you get to enjoy the sun today, Dani

IslaMS commented 4 years ago

Hey @DaniGargya,

I uploaded the Gaussian model results - that model runs super quickly in minutes rather than hours (it is the zero-one inflation that really slows things down), if you want to run it or variations on your own computer.

Great that you have all of your models run now!

Isla