Closed coreytcallaghan closed 5 years ago
@wcornwell I just pushed a script to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.
Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...
Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?
Thanks!
Which section of reviewers comments are you referring to?
On Wed, Aug 21, 2019, 8:27 AM Corey Callaghan notifications@github.com wrote:
@wcornwell https://github.com/wcornwell I just pushed a script https://github.com/coreytcallaghan/optimize_citizen_science_obs/commit/ffff050ff027d8ceff7cda37926e49602f21d2a6 to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.
Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...
Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?
Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIOX6WNZKNQFBEI5ATQFRVUXA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3CWI#issuecomment-523219289, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKHDUK3KJKT746CC2Q6WRDQFRVUXANCNFSM4INMZ63A .
Reviewer 1:
"I have one substantive comment on this contribution:
The authors use bird checklists collected in the greater Sydney area as a case study. In addition to being a relatively well studied group of organisms, this is also an area with both a high density of active citizen scientists and a large number of species. In considering the wider utility of the approach they are suggesting, it begs the question—how representative is this region / this data-set for global citizen science initiatives? Further, is there a minimum number of checklists / active observers for reliable trends to be inferred. Rather than a criticism of their work, this is about being more circumspect about those taxa / regions best suited to their approach.
They could address this in several ways. One would be to use a rarefaction-based approach to remove checklists and explore the effect of sampling density / frequency / number of observers on trend analysis to evaluate the shape of the relationship and identify any critical thresholds. They could also compare their Sydney dataset with somewhere else. Australia is the ideal place to explore this, with very high population density in a handful of coastal capital cities, and very low population densities elsewhere. Thus, rather than optimizing where data are collected within a particular region to detect trends, it may be that simply acquiring more data ANYWHERE in the region is the priority, attracting more effort from adjacent, better sampled areas. By expanding the scope of their work, not only will they be better placed to evaluate the broader utility of their approach, it will also integrate their work with the extensive literature on sampling effort determination "
On Wed, Aug 21, 2019 at 8:32 AM Will Cornwell notifications@github.com wrote:
Which section of reviewers comments are you referring to?
On Wed, Aug 21, 2019, 8:27 AM Corey Callaghan notifications@github.com wrote:
@wcornwell https://github.com/wcornwell I just pushed a script < https://github.com/coreytcallaghan/optimize_citizen_science_obs/commit/ffff050ff027d8ceff7cda37926e49602f21d2a6
to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.
Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...
Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?
Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIOX6WNZKNQFBEI5ATQFRVUXA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3CWI#issuecomment-523219289 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABKHDUK3KJKT746CC2Q6WRDQFRVUXANCNFSM4INMZ63A
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJXWCO3KPSJCUB24YFDQFRWFTA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3MNI#issuecomment-523220533, or mute the thread https://github.com/notifications/unsubscribe-auth/AGWSEJWBZ7IDXLGS4R7FAOLQFRWFTANCNFSM4INMZ63A .
OK let's start at the end and draw the figure that answers that comment... I feel like it's just one species....
On Wed, Aug 21, 2019, 8:37 AM Corey Callaghan notifications@github.com wrote:
Reviewer 1:
"I have one substantive comment on this contribution:
The authors use bird checklists collected in the greater Sydney area as a case study. In addition to being a relatively well studied group of organisms, this is also an area with both a high density of active citizen scientists and a large number of species. In considering the wider utility of the approach they are suggesting, it begs the question—how representative is this region / this data-set for global citizen science initiatives? Further, is there a minimum number of checklists / active observers for reliable trends to be inferred. Rather than a criticism of their work, this is about being more circumspect about those taxa / regions best suited to their approach.
They could address this in several ways. One would be to use a rarefaction-based approach to remove checklists and explore the effect of sampling density / frequency / number of observers on trend analysis to evaluate the shape of the relationship and identify any critical thresholds. They could also compare their Sydney dataset with somewhere else. Australia is the ideal place to explore this, with very high population density in a handful of coastal capital cities, and very low population densities elsewhere. Thus, rather than optimizing where data are collected within a particular region to detect trends, it may be that simply acquiring more data ANYWHERE in the region is the priority, attracting more effort from adjacent, better sampled areas. By expanding the scope of their work, not only will they be better placed to evaluate the broader utility of their approach, it will also integrate their work with the extensive literature on sampling effort determination "
On Wed, Aug 21, 2019 at 8:32 AM Will Cornwell notifications@github.com wrote:
Which section of reviewers comments are you referring to?
On Wed, Aug 21, 2019, 8:27 AM Corey Callaghan notifications@github.com wrote:
@wcornwell https://github.com/wcornwell I just pushed a script <
to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.
Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...
Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?
Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/ABKHDUK3KJKT746CC2Q6WRDQFRVUXANCNFSM4INMZ63A
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJXWCO3KPSJCUB24YFDQFRWFTA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3MNI#issuecomment-523220533 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AGWSEJWBZ7IDXLGS4R7FAOLQFRWFTANCNFSM4INMZ63A
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIR45FYT7QJMGFVRGDQFRW2LA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3XZA#issuecomment-523221988, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKHDUMEWU6B6CFH6VBYHULQFRW2LANCNFSM4INMZ63A .
Maybe y-axis = best fit trend line. X-axis is percent checklists removed. Three treatments= 1 checklists removed in order of leverage, 2 reverse order of leverage, and 3 random order.
Should be three funnel plots...
On Wed, Aug 21, 2019, 8:42 AM Will Cornwell w.cornwell@unsw.edu.au wrote:
OK let's start at the end and draw the figure that answers that comment... I feel like it's just one species....
On Wed, Aug 21, 2019, 8:37 AM Corey Callaghan notifications@github.com wrote:
Reviewer 1:
"I have one substantive comment on this contribution:
The authors use bird checklists collected in the greater Sydney area as a case study. In addition to being a relatively well studied group of organisms, this is also an area with both a high density of active citizen scientists and a large number of species. In considering the wider utility of the approach they are suggesting, it begs the question—how representative is this region / this data-set for global citizen science initiatives? Further, is there a minimum number of checklists / active observers for reliable trends to be inferred. Rather than a criticism of their work, this is about being more circumspect about those taxa / regions best suited to their approach.
They could address this in several ways. One would be to use a rarefaction-based approach to remove checklists and explore the effect of sampling density / frequency / number of observers on trend analysis to evaluate the shape of the relationship and identify any critical thresholds. They could also compare their Sydney dataset with somewhere else. Australia is the ideal place to explore this, with very high population density in a handful of coastal capital cities, and very low population densities elsewhere. Thus, rather than optimizing where data are collected within a particular region to detect trends, it may be that simply acquiring more data ANYWHERE in the region is the priority, attracting more effort from adjacent, better sampled areas. By expanding the scope of their work, not only will they be better placed to evaluate the broader utility of their approach, it will also integrate their work with the extensive literature on sampling effort determination "
On Wed, Aug 21, 2019 at 8:32 AM Will Cornwell notifications@github.com wrote:
Which section of reviewers comments are you referring to?
On Wed, Aug 21, 2019, 8:27 AM Corey Callaghan <notifications@github.com
wrote:
@wcornwell https://github.com/wcornwell I just pushed a script <
to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.
Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...
Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?
Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/ABKHDUK3KJKT746CC2Q6WRDQFRVUXANCNFSM4INMZ63A
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJXWCO3KPSJCUB24YFDQFRWFTA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3MNI#issuecomment-523220533 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AGWSEJWBZ7IDXLGS4R7FAOLQFRWFTANCNFSM4INMZ63A
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIR45FYT7QJMGFVRGDQFRW2LA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3XZA#issuecomment-523221988, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKHDUMEWU6B6CFH6VBYHULQFRW2LANCNFSM4INMZ63A .
Okay. I'm envisioning some figure that shows number of samples versus some Goodness of Fit or standard error... And hopefully as samples increase, the standard error decreases or Goodness of Fit increases. Could then show this same figure for number of observers, and potentially number of spaced out checklists (but idk how to do that part).
I think this (combined with our other plan) should really respond well to this reviewer, combined with a few lines in the discussion. etc.
On Wed, Aug 21, 2019 at 8:43 AM Will Cornwell notifications@github.com wrote:
OK let's start at the end and draw the figure that answers that comment... I feel like it's just one species....
On Wed, Aug 21, 2019, 8:37 AM Corey Callaghan notifications@github.com wrote:
Reviewer 1:
"I have one substantive comment on this contribution:
The authors use bird checklists collected in the greater Sydney area as a case study. In addition to being a relatively well studied group of organisms, this is also an area with both a high density of active citizen scientists and a large number of species. In considering the wider utility of the approach they are suggesting, it begs the question—how representative is this region / this data-set for global citizen science initiatives? Further, is there a minimum number of checklists / active observers for reliable trends to be inferred. Rather than a criticism of their work, this is about being more circumspect about those taxa / regions best suited to their approach.
They could address this in several ways. One would be to use a rarefaction-based approach to remove checklists and explore the effect of sampling density / frequency / number of observers on trend analysis to evaluate the shape of the relationship and identify any critical thresholds. They could also compare their Sydney dataset with somewhere else. Australia is the ideal place to explore this, with very high population density in a handful of coastal capital cities, and very low population densities elsewhere. Thus, rather than optimizing where data are collected within a particular region to detect trends, it may be that simply acquiring more data ANYWHERE in the region is the priority, attracting more effort from adjacent, better sampled areas. By expanding the scope of their work, not only will they be better placed to evaluate the broader utility of their approach, it will also integrate their work with the extensive literature on sampling effort determination "
On Wed, Aug 21, 2019 at 8:32 AM Will Cornwell notifications@github.com wrote:
Which section of reviewers comments are you referring to?
On Wed, Aug 21, 2019, 8:27 AM Corey Callaghan < notifications@github.com> wrote:
@wcornwell https://github.com/wcornwell I just pushed a script <
to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.
Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...
Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?
Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/ABKHDUK3KJKT746CC2Q6WRDQFRVUXANCNFSM4INMZ63A
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AGWSEJWBZ7IDXLGS4R7FAOLQFRWFTANCNFSM4INMZ63A
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIR45FYT7QJMGFVRGDQFRW2LA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3XZA#issuecomment-523221988 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABKHDUMEWU6B6CFH6VBYHULQFRW2LANCNFSM4INMZ63A
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJRRS3M6LMAWG6VHBRTQFRXPNA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X4COQ#issuecomment-523223354, or mute the thread https://github.com/notifications/unsubscribe-auth/AGWSEJQ4V74DJMCRJEDG2M3QFRXPNANCNFSM4INMZ63A .
Goodness of Fit or standard error are pretty tough to interpret in rarefaction context?!?! Maybe trend slope estimate or deviation from best trend slope estimate will be easier to interpret.
On Wed, Aug 21, 2019, 8:52 AM Corey Callaghan notifications@github.com wrote:
Okay. I'm envisioning some figure that shows number of samples versus some Goodness of Fit or standard error... And hopefully as samples increase, the standard error decreases or Goodness of Fit increases. Could then show this same figure for number of observers, and potentially number of spaced out checklists (but idk how to do that part).
I think this (combined with our other plan) should really respond well to this reviewer, combined with a few lines in the discussion. etc.
On Wed, Aug 21, 2019 at 8:43 AM Will Cornwell notifications@github.com wrote:
OK let's start at the end and draw the figure that answers that comment... I feel like it's just one species....
On Wed, Aug 21, 2019, 8:37 AM Corey Callaghan notifications@github.com wrote:
Reviewer 1:
"I have one substantive comment on this contribution:
The authors use bird checklists collected in the greater Sydney area as a case study. In addition to being a relatively well studied group of organisms, this is also an area with both a high density of active citizen scientists and a large number of species. In considering the wider utility of the approach they are suggesting, it begs the question—how representative is this region / this data-set for global citizen science initiatives? Further, is there a minimum number of checklists / active observers for reliable trends to be inferred. Rather than a criticism of their work, this is about being more circumspect about those taxa / regions best suited to their approach.
They could address this in several ways. One would be to use a rarefaction-based approach to remove checklists and explore the effect of sampling density / frequency / number of observers on trend analysis to evaluate the shape of the relationship and identify any critical thresholds. They could also compare their Sydney dataset with somewhere else. Australia is the ideal place to explore this, with very high population density in a handful of coastal capital cities, and very low population densities elsewhere. Thus, rather than optimizing where data are collected within a particular region to detect trends, it may be that simply acquiring more data ANYWHERE in the region is the priority, attracting more effort from adjacent, better sampled areas. By expanding the scope of their work, not only will they be better placed to evaluate the broader utility of their approach, it will also integrate their work with the extensive literature on sampling effort determination "
On Wed, Aug 21, 2019 at 8:32 AM Will Cornwell < notifications@github.com> wrote:
Which section of reviewers comments are you referring to?
On Wed, Aug 21, 2019, 8:27 AM Corey Callaghan < notifications@github.com> wrote:
@wcornwell https://github.com/wcornwell I just pushed a script <
to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.
Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...
Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?
Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/ABKHDUK3KJKT746CC2Q6WRDQFRVUXANCNFSM4INMZ63A
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AGWSEJWBZ7IDXLGS4R7FAOLQFRWFTANCNFSM4INMZ63A
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/ABKHDUMEWU6B6CFH6VBYHULQFRW2LANCNFSM4INMZ63A
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJRRS3M6LMAWG6VHBRTQFRXPNA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X4COQ#issuecomment-523223354 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AGWSEJQ4V74DJMCRJEDG2M3QFRXPNANCNFSM4INMZ63A
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUKVVT2SMBOSEW674ADQFRYRHA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X4TNI#issuecomment-523225525, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKHDUOXFFUDKZVIGAU7CA3QFRYRHANCNFSM4INMZ63A .
Doesn't seem to be much going on... Unless I repeat the random sample at each percentage level 100 times or something? Then maybe the pattern will be a bit stronger?
Then this might be helpful: https://stats.stackexchange.com/questions/5195/how-to-draw-funnel-plot-using-ggplot2-in-r
yeah should be stochastic so repeat each a bunch of times. can get this on katana if you want.
yeah. let me tweak the code and run another test or two, then you can katana it :)
On Wed, Aug 21, 2019 at 11:49 AM Will Cornwell notifications@github.com wrote:
yeah should be stochastic so repeat each a bunch of times. can get this on katana if you want.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJRXI46KG74LQB5UZELQFSNJXA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4YFMYQ#issuecomment-523261538, or mute the thread https://github.com/notifications/unsubscribe-auth/AGWSEJSYMO4FE3IV2Z2OLGLQFSNJXANCNFSM4INMZ63A .
Getting there...
this is 100 runs for the top thirty species that finished overnight on my machine.
This is as far as I've gotten. Still can't figure out how to make nice changing 95% confidence intervals. This code didn't seem to work for me, and just makes really small confidence intervals.
# packages
library(ggplot2)
library(dplyr)
library(purrr)
setwd("Data/permutation_sensitivity/top_ten_species")
df_one_to_ten <- list.files(pattern = ".RDS") %>%
map(readRDS) %>%
bind_rows()
setwd("..")
setwd("eleven_to_twenty_species")
df_eleven_to_twenty <- list.files(pattern = ".RDS") %>%
map(readRDS) %>%
bind_rows()
setwd("..")
setwd("twentyone_to_thirty_species")
df_twentyone_to_thirty <- list.files(pattern = ".RDS") %>%
map(readRDS) %>%
bind_rows()
setwd("..")
data <- bind_rows(df_one_to_ten,
df_eleven_to_twenty,
df_twentyone_to_thirty)
# remove all models which did not converge!
data2 <- data %>%
dplyr::filter(mod_converged=="TRUE") %>%
group_by(COMMON_NAME) %>%
mutate(upper_quartile=quantile(slope_date, 0.75)) %>%
mutate(lower_quartile=quantile(slope_date, 0.25)) %>%
mutate(mean_slope=mean(slope_date)) %>%
mutate(sd_slope=sd(slope_date)) %>%
mutate(greater_than_2_sd_top=ifelse(slope_date >= mean_slope+sd_slope+sd_slope, "TRUE", "FALSE")) %>%
mutate(lesser_than_2_sd_bottom=ifelse(slope_date <=mean_slope-sd_slope-sd_slope, "TRUE", "FALSE")) %>%
mutate(within_2_sd = ifelse(greater_than_2_sd_top == "TRUE" | lesser_than_2_sd_bottom == "TRUE", "FALSE", "TRUE")) %>%
dplyr::filter(within_2_sd == "TRUE")
ggplot(data2, aes(x=percent_of_sample, y=slope_date))+
geom_jitter()+
facet_wrap(~COMMON_NAME, scales="free")
[image: image.png]
On Thu, Aug 22, 2019 at 8:45 AM Corey Callaghan notifications@github.com wrote:
This is as far as I've gotten. Still can't figure out how to make nice changing 95% confidence intervals. This code https://stats.stackexchange.com/questions/5195/how-to-draw-funnel-plot-using-ggplot2-in-r didn't seem to work for me, and just makes really small confidence intervals.
packages
library(ggplot2) library(dplyr) library(purrr)
setwd("Data/permutation_sensitivity/top_ten_species")
df_one_to_ten <- list.files(pattern = ".RDS") %>% map(readRDS) %>% bind_rows()
setwd("..")
setwd("eleven_to_twenty_species")
df_eleven_to_twenty <- list.files(pattern = ".RDS") %>% map(readRDS) %>% bind_rows()
setwd("..")
setwd("twentyone_to_thirty_species")
df_twentyone_to_thirty <- list.files(pattern = ".RDS") %>% map(readRDS) %>% bind_rows()
setwd("..")
data <- bind_rows(df_one_to_ten, df_eleven_to_twenty, df_twentyone_to_thirty)
remove all models which did not converge!
data2 <- data %>% dplyr::filter(mod_converged=="TRUE") %>% group_by(COMMON_NAME) %>% mutate(upper_quartile=quantile(slope_date, 0.75)) %>% mutate(lower_quartile=quantile(slope_date, 0.25)) %>% mutate(mean_slope=mean(slope_date)) %>% mutate(sd_slope=sd(slope_date)) %>% mutate(greater_than_2_sd_top=ifelse(slope_date >= mean_slope+sd_slope+sd_slope, "TRUE", "FALSE")) %>% mutate(lesser_than_2_sd_bottom=ifelse(slope_date <=mean_slope-sd_slope-sd_slope, "TRUE", "FALSE")) %>% mutate(within_2_sd = ifelse(greater_than_2_sd_top == "TRUE" | lesser_than_2_sd_bottom == "TRUE", "FALSE", "TRUE")) %>% dplyr::filter(within_2_sd == "TRUE")
ggplot(data2, aes(x=percent_of_sample, y=slope_date))+ geom_jitter()+ facet_wrap(~COMMON_NAME, scales="free")
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUID3MIGJGNKG3WDTALQFXAPBA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD43LC5Y#issuecomment-523678071, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKHDUMWR4LQXHG3726QKRTQFXAPBANCNFSM4INMZ63A .
-- Associate Professor, School of Biological, Earth and Environmental Sciences, UNSW www.willcornwell.org @will_cornwell
This isn't perfect, but is it good enough?
or:
or wiggly:
Yes, those I like! Probably the straight one...
Then, we just need some some sort of definition where the data converge, and calculate that for each species. Can then present the mean "number of checklists" which I think is what the reviewer is after, and compare that number with 10,000 from Horns et al. 2018.
less wiggly:
maybe should change x-axis to raw number of checklists?
also need to think about units for y-axis. Is there a way to make those intuitive?
code:
library(quantreg)
filter(data2,slope_date > -10^9) %>%
ggplot(aes(x=percent_of_sample, y=slope_date))+
geom_point(col="grey",alpha=0.1)+
geom_quantile(quantiles=c(0.05,0.95),method="rqss",formula=y ~ qss(x, lambda = 10))+
# geom_density_2d()+
theme_classic() +
facet_wrap(~COMMON_NAME, scales="free")
Going round in circles of gof for GLMs, but going through models using AIC for now? Will push code shortly.