AileneKane / bayes4cons

0 stars 0 forks source link

finalize the code for the box on NHT versus posterior #1

Open lizzieinvancouver opened 2 months ago

lizzieinvancouver commented 2 months ago

We finally have some code running! @wangxm-forest will update the end of the datasim_explore.r (https://github.com/AileneKane/bayes4cons/blob/main/bayesvfisherbox/datasimulation/datasim_explore.R) to go through a few potential seeds and we will finalize the plots soon.

After that we can add the posterior and I will work on the text for the box ... hopefully done in maybe two weeks? We hope.

wangxm-forest commented 2 months ago

@lizzieinvancouver @DeirdreLoughnan I just pushed the plots using different seeds. I think for most of the seeds, we might still need to delete at least two populations to make them look good. Or we could try to simulate more populations, but I think 5 populations might be good enough. seeds.pdf

lizzieinvancouver commented 2 months ago

@wangxm-forest Nice work! I think 5 populations would be good enough. Can you tell me your top 2-3 options that you were thinking of (seed x noise and then what plots to drop)?

I think maybe Seed: 1546 Noise: 7000 but before I commit ... can you remind me what the given slopes were across the 7 plots (just paste in the code here)?

lizzieinvancouver commented 2 months ago

@wangxm-forest Just a reminder about this so we don't forget to finish it.

DeirdreLoughnan commented 2 months ago

@lizzieinvancouver @wangxm-forest I would choose

  1. Seeds 3069 and drop the 2nd population (noise 7000)
  2. Seed 3545 and drop the 1st population (noise 8600).

But I also started playing around with Seed 1546 and getting the Stan code running. The code is here and a plot comparing the stan to the lm trends is here.

wangxm-forest commented 1 month ago

@lizzieinvancouver @DeirdreLoughnan Sorry, I have problem access to google on my phone when I am in China so I missed the notification for this issue! I also think 3545 dropping the 1st population could be a good option, and also 1546 dropping the 2nd population. For 3069, the 1st population has the p-value smaller than 0.05, do we want the population declines the most have a p-value greater than 0.05 though?

The given slopes are: a <- c(-2000, -1600, -1400, 600, 1400 , 1600, 2000)

lizzieinvancouver commented 1 month ago

@wangxm-forest and @DeirdreLoughnan Thanks! I was thinking that we need to drop TWO of the trends (e.g., would it seem weird to just drop -1600 and not drop 1600?) but I guess we could.... If we want to drop two, then I like 1546 and drop -1600 and 1600. This one also has the benefit of the slopes then going in relatively good order (-1700, -1300, 650, 1450, 1850).

@DeirdreLoughnan Nice work on the code! I suggest we keep plots in one column (so it looks like north to south) and also do not plot the lines where lm > 0.05 ... I am not sure how we should plot the Stan output but we should show the posteriors somehow (overlay draws as lighter lines?). For now, it would be cool to have an overlay histogram of the posterior slope for the first and last population and show how they compare (similar to the Wade paper ... though we have this positive/negative thing, perhaps we show the absolute value? That is weird, but otherwise I don't think people we get the comparison).

We want a figure that highlights different inferences you might make ... where in the text we can say something like "for the most declining populations NHT with p<0.05 you would conclude no change, but the posterior highlights it changing the most for declining populations -- at a rate almost as fast as the most increasing population."

wangxm-forest commented 1 month ago

@lizzieinvancouver @DeirdreLoughnan I joined the meeting today and asked if they have any suggestions on how we should plot our results. They suggested that maybe we could make a plot like this showing the confidence intervals, credible intervals (and p-values for frequentist?) as well.

8c31c9e2d1fb35ce60b826131401b9b
lizzieinvancouver commented 1 month ago

@wangxm-forest Thanks for attending the meeting! I don't love these types of plots as they just pressure towards NHT -- it focuses attention on what crosses zero and the Bayesian interval becomes more NHT (and what interval matters). I suggest instead we show the data, then add the way folks for significant p values ( for <0.05 but >0.01 ** for <0.01 etc.) and show the posteriors of each next to the data. To me that would highlight how people often treat NHT with p-values versus how people often treat Bayesian. Maybe you and @DeirdreLoughnan could work on a draft of this and I can add text and then we can see what the rest of the group thinks?

Also, do you know why they added issue #9? The data are time series population counts ... they don't really feel like a poisson to me, but if someone wants to do that and compare to Gaussian then it seems fine (I just do not have time to help with that).

wangxm-forest commented 1 month ago

@lizzieinvancouver I think they suggested this plot saying that both frequentist and bayesian will cross zero, but bayesian one will have a narrower interval. I personally think we definely should have plots showing the data and p-values for frequentist as we want to demonstrate that the population declining the fastest is statistically nonsignificant. I will talk to Deirdre and see how we should work on that! And about issue 9, Ailene said she wants to try the poisson distribution and she will work on that, so we don't need to worry about that I think.

lizzieoverleaf commented 1 month ago

I think they suggested this plot saying that both frequentist and bayesian will cross zero, but bayesian one will have a narrower interval.

Thanks @wangxm-forest ! I do not know if Bayesian would have a narrower interval (for 95%? I am really NOT a fan of 95% intervals) and do not think of that as a reason to use Bayesian (to get narrower intervals). I think our point is Bayesian helps move away from NHT so I think we should focus on figures that show that, and anything that highlights 0 and whether it is crossed for Bayesian and Frequentist will just frame both as NHT.

DeirdreLoughnan commented 1 month ago

@lizzieinvancouver I met with @wangxm-forest last week and we discussed the figure. She had sketch you had done previously that seemed like what you were describing here. I have made a first draft of the figure here. I will fix the axis in the next draft so the lines meet.

I know you mentioned only showing the histogram for the first and last population, what was your reasoning for this?

lizzieinvancouver commented 1 month ago

@DeirdreLoughnan Nice plots!

I know you mentioned only showing the histogram for the first and last population, what was your reasoning for this?

I was thinking we to simplify to draw the reader to only a few things we want them to see so that it would work with the text. It might work though to show all of them, I can see it could seem weird not to ... I would suggest only drawing the slope when the line is significant (and maybe dashed for p <0.1 but >0.05?) and making the numbers in sci. notation. I wonder if layering all of them onto ONE plot would be too much? Then it would be easy to see the posteriors shifting ... We also should name or otherwise note the populations (most north, north, middle, south, most south? Also, I think the plots should go in reverse order for our story of the trailing edge declining).

I think the current text would be something like (in case it helps guide the figure):

Using frequentist statistics on these populations (left) would suggest several that are not changing (both southern and middle) when a Bayesian approach more clearly highlights all but the middle seem to have shifting population sizes (increasing at the northern edge, declining at the southern) ..[but more variance in southern populations pushes them above significance.]

I can work on the text when you tell me to!

DeirdreLoughnan commented 1 month ago

@lizzieinvancouver I have pushed a revised version of the figure and a draft of a figure where all points and histograms are layered respectively. But I do feel this figure is a bit busy.

I think for the five row figure, I will switch the p-value from being the title to text within the plot and the reference to population location to the main title.

lizzieinvancouver commented 1 month ago

I have pushed a revised version of the figure and a draft of a figure where all points and histograms are layered respectively. But I do feel this figure is a bit busy.

@DeirdreLoughnan These are great! I really like the two panel figure personally as I think it shows how you could end up thinking only two things change with NHT, but the trends and uncertainty look much more similar in the posteriors. Do others much prefer the 5 panel?

Smaller tweaks:

DeirdreLoughnan commented 1 month ago

@lizzieinvancouver @wangxm-forest We discussed our figures at this weeks meeting. The general consensus was that the two panel version was most effective. But others also liked the idea of including points on the righthand panel showing the mean and 90% UI.

@lizzieinvancouver regarding the scientific notation, do you mean having all the p-values formatted as 1.0e^-1 for example?

The latest version can be found here and the related code here. This code generates the data, runs the Stan model, and makes the figure.

@AileneKane What would be easiest in terms of adding figure letters? Do you usually do this in the main text document or prefer to add them in R?

lizzieinvancouver commented 1 month ago

The general consensus was that the two panel version was most effective.

Me too! I also really like the current figure. I might just make the slope lines solid (I was thinking of dashed lines for things in between 0.1 and 0.05 but we do not have that).

regarding the scientific notation, do you mean having all the p-values formatted as 1.0e^-1 for example?

I meant the abundance -- make it 2^4 or such as the actual numbers are not so important.

We should make code that generates EXACTLY these data if we have not already.

@DeirdreLoughnan @wangxm-forest Do you want me to work on editing the text to match the current figure?

DeirdreLoughnan commented 1 month ago

@lizzieinvancouver Great! The code that I link to above generates this exact data, if you have any issues running it, let me know.

I will update the figure with values as scientific notation on the y-axis and change the lines to solid. But perhaps will wait to hear whether @AileneKane would like me to include the figure letters in the image.

It would be great if you are able to start editing the text to match the figure @lizzieinvancouver, let me know if there is anything I can do to help.

lizzieinvancouver commented 2 weeks ago

@DeirdreLoughnan Did the axes (y and x) get updated to scientific notation? I may have been looking at the wrong figure though.

For now @AileneKane I added text for the box to go with the figure.

AileneKane commented 1 week ago

This is great! I think it would be great to include the figure letters in the image. Thanks!

lizzieinvancouver commented 1 week ago

@lizzieinvancouver will work up text a little more:

DeirdreLoughnan commented 1 week ago

@lizzieinvancouver @AileneKane

I pushed the revised version of the figure with the figure letters and updated code.

@lizzieinvancouver let me know if you would like help revising the text for the box!

wangxm-forest commented 1 week ago

@lizzieinvancouver @DeirdreLoughnan Sorry that I didn't attend the meeting today because I had an appointment I couldn't reschedule for another time. Please let me know if there is anything else I can do to help!