Plot the relationship between # of checklists and some gof for a GLM

coreytcallaghan / optimizing-citizen-science-sampling

Paper is here: https://doi.org/10.1098/rspb.2019.1487

3 stars 0 forks source link

Plot the relationship between # of checklists and some gof for a GLM #17

Closed coreytcallaghan closed 5 years ago

coreytcallaghan commented 5 years ago

Going round in circles of gof for GLMs, but going through models using AIC for now? Will push code shortly.

coreytcallaghan commented 5 years ago

@wcornwell I just pushed a script to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.

Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...

Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?

Thanks!

wcornwell commented 5 years ago

Which section of reviewers comments are you referring to?

On Wed, Aug 21, 2019, 8:27 AM Corey Callaghan notifications@github.com wrote:

@wcornwell https://github.com/wcornwell I just pushed a script https://github.com/coreytcallaghan/optimize_citizen_science_obs/commit/ffff050ff027d8ceff7cda37926e49602f21d2a6 to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.

Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...

Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIOX6WNZKNQFBEI5ATQFRVUXA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3CWI#issuecomment-523219289, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKHDUK3KJKT746CC2Q6WRDQFRVUXANCNFSM4INMZ63A .

coreytcallaghan commented 5 years ago

Reviewer 1:

"I have one substantive comment on this contribution:

The authors use bird checklists collected in the greater Sydney area as a case study. In addition to being a relatively well studied group of organisms, this is also an area with both a high density of active citizen scientists and a large number of species. In considering the wider utility of the approach they are suggesting, it begs the question—how representative is this region / this data-set for global citizen science initiatives? Further, is there a minimum number of checklists / active observers for reliable trends to be inferred. Rather than a criticism of their work, this is about being more circumspect about those taxa / regions best suited to their approach.

They could address this in several ways. One would be to use a rarefaction-based approach to remove checklists and explore the effect of sampling density / frequency / number of observers on trend analysis to evaluate the shape of the relationship and identify any critical thresholds. They could also compare their Sydney dataset with somewhere else. Australia is the ideal place to explore this, with very high population density in a handful of coastal capital cities, and very low population densities elsewhere. Thus, rather than optimizing where data are collected within a particular region to detect trends, it may be that simply acquiring more data ANYWHERE in the region is the priority, attracting more effort from adjacent, better sampled areas. By expanding the scope of their work, not only will they be better placed to evaluate the broader utility of their approach, it will also integrate their work with the extensive literature on sampling effort determination "

On Wed, Aug 21, 2019 at 8:32 AM Will Cornwell notifications@github.com wrote:

Which section of reviewers comments are you referring to?

On Wed, Aug 21, 2019, 8:27 AM Corey Callaghan notifications@github.com wrote:

@wcornwell https://github.com/wcornwell I just pushed a script < https://github.com/coreytcallaghan/optimize_citizen_science_obs/commit/ffff050ff027d8ceff7cda37926e49602f21d2a6

to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.

Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...

Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIOX6WNZKNQFBEI5ATQFRVUXA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3CWI#issuecomment-523219289 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABKHDUK3KJKT746CC2Q6WRDQFRVUXANCNFSM4INMZ63A

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJXWCO3KPSJCUB24YFDQFRWFTA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3MNI#issuecomment-523220533, or mute the thread https://github.com/notifications/unsubscribe-auth/AGWSEJWBZ7IDXLGS4R7FAOLQFRWFTANCNFSM4INMZ63A .

wcornwell commented 5 years ago

OK let's start at the end and draw the figure that answers that comment... I feel like it's just one species....

On Wed, Aug 21, 2019, 8:37 AM Corey Callaghan notifications@github.com wrote:

Reviewer 1:

"I have one substantive comment on this contribution:

The authors use bird checklists collected in the greater Sydney area as a case study. In addition to being a relatively well studied group of organisms, this is also an area with both a high density of active citizen scientists and a large number of species. In considering the wider utility of the approach they are suggesting, it begs the question—how representative is this region / this data-set for global citizen science initiatives? Further, is there a minimum number of checklists / active observers for reliable trends to be inferred. Rather than a criticism of their work, this is about being more circumspect about those taxa / regions best suited to their approach.

They could address this in several ways. One would be to use a rarefaction-based approach to remove checklists and explore the effect of sampling density / frequency / number of observers on trend analysis to evaluate the shape of the relationship and identify any critical thresholds. They could also compare their Sydney dataset with somewhere else. Australia is the ideal place to explore this, with very high population density in a handful of coastal capital cities, and very low population densities elsewhere. Thus, rather than optimizing where data are collected within a particular region to detect trends, it may be that simply acquiring more data ANYWHERE in the region is the priority, attracting more effort from adjacent, better sampled areas. By expanding the scope of their work, not only will they be better placed to evaluate the broader utility of their approach, it will also integrate their work with the extensive literature on sampling effort determination "

On Wed, Aug 21, 2019 at 8:32 AM Will Cornwell notifications@github.com wrote:

Which section of reviewers comments are you referring to?

On Wed, Aug 21, 2019, 8:27 AM Corey Callaghan notifications@github.com wrote:

@wcornwell https://github.com/wcornwell I just pushed a script <

https://github.com/coreytcallaghan/optimize_citizen_science_obs/commit/ffff050ff027d8ceff7cda37926e49602f21d2a6

to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.

Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...

Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIOX6WNZKNQFBEI5ATQFRVUXA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3CWI#issuecomment-523219289

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/ABKHDUK3KJKT746CC2Q6WRDQFRVUXANCNFSM4INMZ63A

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJXWCO3KPSJCUB24YFDQFRWFTA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3MNI#issuecomment-523220533 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AGWSEJWBZ7IDXLGS4R7FAOLQFRWFTANCNFSM4INMZ63A

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIR45FYT7QJMGFVRGDQFRW2LA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3XZA#issuecomment-523221988, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKHDUMEWU6B6CFH6VBYHULQFRW2LANCNFSM4INMZ63A .

wcornwell commented 5 years ago

Maybe y-axis = best fit trend line. X-axis is percent checklists removed. Three treatments= 1 checklists removed in order of leverage, 2 reverse order of leverage, and 3 random order.

Should be three funnel plots...

On Wed, Aug 21, 2019, 8:42 AM Will Cornwell w.cornwell@unsw.edu.au wrote:

OK let's start at the end and draw the figure that answers that comment... I feel like it's just one species....

On Wed, Aug 21, 2019, 8:37 AM Corey Callaghan notifications@github.com wrote:

Reviewer 1:

"I have one substantive comment on this contribution:

The authors use bird checklists collected in the greater Sydney area as a case study. In addition to being a relatively well studied group of organisms, this is also an area with both a high density of active citizen scientists and a large number of species. In considering the wider utility of the approach they are suggesting, it begs the question—how representative is this region / this data-set for global citizen science initiatives? Further, is there a minimum number of checklists / active observers for reliable trends to be inferred. Rather than a criticism of their work, this is about being more circumspect about those taxa / regions best suited to their approach.

They could address this in several ways. One would be to use a rarefaction-based approach to remove checklists and explore the effect of sampling density / frequency / number of observers on trend analysis to evaluate the shape of the relationship and identify any critical thresholds. They could also compare their Sydney dataset with somewhere else. Australia is the ideal place to explore this, with very high population density in a handful of coastal capital cities, and very low population densities elsewhere. Thus, rather than optimizing where data are collected within a particular region to detect trends, it may be that simply acquiring more data ANYWHERE in the region is the priority, attracting more effort from adjacent, better sampled areas. By expanding the scope of their work, not only will they be better placed to evaluate the broader utility of their approach, it will also integrate their work with the extensive literature on sampling effort determination "

On Wed, Aug 21, 2019 at 8:32 AM Will Cornwell notifications@github.com wrote:

Which section of reviewers comments are you referring to?

On Wed, Aug 21, 2019, 8:27 AM Corey Callaghan <notifications@github.com

wrote:

@wcornwell https://github.com/wcornwell I just pushed a script <

https://github.com/coreytcallaghan/optimize_citizen_science_obs/commit/ffff050ff027d8ceff7cda37926e49602f21d2a6

to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.

Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...

Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIOX6WNZKNQFBEI5ATQFRVUXA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3CWI#issuecomment-523219289

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/ABKHDUK3KJKT746CC2Q6WRDQFRVUXANCNFSM4INMZ63A

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJXWCO3KPSJCUB24YFDQFRWFTA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3MNI#issuecomment-523220533 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AGWSEJWBZ7IDXLGS4R7FAOLQFRWFTANCNFSM4INMZ63A

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIR45FYT7QJMGFVRGDQFRW2LA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3XZA#issuecomment-523221988, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKHDUMEWU6B6CFH6VBYHULQFRW2LANCNFSM4INMZ63A .

coreytcallaghan commented 5 years ago

Okay. I'm envisioning some figure that shows number of samples versus some Goodness of Fit or standard error... And hopefully as samples increase, the standard error decreases or Goodness of Fit increases. Could then show this same figure for number of observers, and potentially number of spaced out checklists (but idk how to do that part).

I think this (combined with our other plan) should really respond well to this reviewer, combined with a few lines in the discussion. etc.

On Wed, Aug 21, 2019 at 8:43 AM Will Cornwell notifications@github.com wrote:

OK let's start at the end and draw the figure that answers that comment... I feel like it's just one species....

On Wed, Aug 21, 2019, 8:37 AM Corey Callaghan notifications@github.com wrote:

Reviewer 1:

"I have one substantive comment on this contribution:

The authors use bird checklists collected in the greater Sydney area as a case study. In addition to being a relatively well studied group of organisms, this is also an area with both a high density of active citizen scientists and a large number of species. In considering the wider utility of the approach they are suggesting, it begs the question—how representative is this region / this data-set for global citizen science initiatives? Further, is there a minimum number of checklists / active observers for reliable trends to be inferred. Rather than a criticism of their work, this is about being more circumspect about those taxa / regions best suited to their approach.

They could address this in several ways. One would be to use a rarefaction-based approach to remove checklists and explore the effect of sampling density / frequency / number of observers on trend analysis to evaluate the shape of the relationship and identify any critical thresholds. They could also compare their Sydney dataset with somewhere else. Australia is the ideal place to explore this, with very high population density in a handful of coastal capital cities, and very low population densities elsewhere. Thus, rather than optimizing where data are collected within a particular region to detect trends, it may be that simply acquiring more data ANYWHERE in the region is the priority, attracting more effort from adjacent, better sampled areas. By expanding the scope of their work, not only will they be better placed to evaluate the broader utility of their approach, it will also integrate their work with the extensive literature on sampling effort determination "

On Wed, Aug 21, 2019 at 8:32 AM Will Cornwell notifications@github.com wrote:

Which section of reviewers comments are you referring to?

On Wed, Aug 21, 2019, 8:27 AM Corey Callaghan < notifications@github.com> wrote:

@wcornwell https://github.com/wcornwell I just pushed a script <

https://github.com/coreytcallaghan/optimize_citizen_science_obs/commit/ffff050ff027d8ceff7cda37926e49602f21d2a6

to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.

Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...

Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIOX6WNZKNQFBEI5ATQFRVUXA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3CWI#issuecomment-523219289

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/ABKHDUK3KJKT746CC2Q6WRDQFRVUXANCNFSM4INMZ63A

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJXWCO3KPSJCUB24YFDQFRWFTA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3MNI#issuecomment-523220533

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AGWSEJWBZ7IDXLGS4R7FAOLQFRWFTANCNFSM4INMZ63A

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIR45FYT7QJMGFVRGDQFRW2LA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3XZA#issuecomment-523221988 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABKHDUMEWU6B6CFH6VBYHULQFRW2LANCNFSM4INMZ63A

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJRRS3M6LMAWG6VHBRTQFRXPNA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X4COQ#issuecomment-523223354, or mute the thread https://github.com/notifications/unsubscribe-auth/AGWSEJQ4V74DJMCRJEDG2M3QFRXPNANCNFSM4INMZ63A .

wcornwell commented 5 years ago

Goodness of Fit or standard error are pretty tough to interpret in rarefaction context?!?! Maybe trend slope estimate or deviation from best trend slope estimate will be easier to interpret.

On Wed, Aug 21, 2019, 8:52 AM Corey Callaghan notifications@github.com wrote:

Okay. I'm envisioning some figure that shows number of samples versus some Goodness of Fit or standard error... And hopefully as samples increase, the standard error decreases or Goodness of Fit increases. Could then show this same figure for number of observers, and potentially number of spaced out checklists (but idk how to do that part).

I think this (combined with our other plan) should really respond well to this reviewer, combined with a few lines in the discussion. etc.

On Wed, Aug 21, 2019 at 8:43 AM Will Cornwell notifications@github.com wrote:

OK let's start at the end and draw the figure that answers that comment... I feel like it's just one species....

On Wed, Aug 21, 2019, 8:37 AM Corey Callaghan notifications@github.com wrote:

Reviewer 1:

"I have one substantive comment on this contribution:

The authors use bird checklists collected in the greater Sydney area as a case study. In addition to being a relatively well studied group of organisms, this is also an area with both a high density of active citizen scientists and a large number of species. In considering the wider utility of the approach they are suggesting, it begs the question—how representative is this region / this data-set for global citizen science initiatives? Further, is there a minimum number of checklists / active observers for reliable trends to be inferred. Rather than a criticism of their work, this is about being more circumspect about those taxa / regions best suited to their approach.

They could address this in several ways. One would be to use a rarefaction-based approach to remove checklists and explore the effect of sampling density / frequency / number of observers on trend analysis to evaluate the shape of the relationship and identify any critical thresholds. They could also compare their Sydney dataset with somewhere else. Australia is the ideal place to explore this, with very high population density in a handful of coastal capital cities, and very low population densities elsewhere. Thus, rather than optimizing where data are collected within a particular region to detect trends, it may be that simply acquiring more data ANYWHERE in the region is the priority, attracting more effort from adjacent, better sampled areas. By expanding the scope of their work, not only will they be better placed to evaluate the broader utility of their approach, it will also integrate their work with the extensive literature on sampling effort determination "

On Wed, Aug 21, 2019 at 8:32 AM Will Cornwell < notifications@github.com> wrote:

Which section of reviewers comments are you referring to?

On Wed, Aug 21, 2019, 8:27 AM Corey Callaghan < notifications@github.com> wrote:

@wcornwell https://github.com/wcornwell I just pushed a script <

https://github.com/coreytcallaghan/optimize_citizen_science_obs/commit/ffff050ff027d8ceff7cda37926e49602f21d2a6

to look at some sort of rarefaction type thing... only doing it for 3 species to start out with. Want to make sure we have everything and then can scale up for all species.

Been going down rabbit holes of Goodness of Fit for GLMs etc. etc...

Still not sure what to do. Can you have a look at this script? In the current test case there are sometime models with ginormous standard error around the date term, so it is hard to see patterns.... Thoughts?

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIOX6WNZKNQFBEI5ATQFRVUXA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3CWI#issuecomment-523219289

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/ABKHDUK3KJKT746CC2Q6WRDQFRVUXANCNFSM4INMZ63A

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJXWCO3KPSJCUB24YFDQFRWFTA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3MNI#issuecomment-523220533

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AGWSEJWBZ7IDXLGS4R7FAOLQFRWFTANCNFSM4INMZ63A

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUIR45FYT7QJMGFVRGDQFRW2LA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X3XZA#issuecomment-523221988

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/ABKHDUMEWU6B6CFH6VBYHULQFRW2LANCNFSM4INMZ63A

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJRRS3M6LMAWG6VHBRTQFRXPNA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X4COQ#issuecomment-523223354 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AGWSEJQ4V74DJMCRJEDG2M3QFRXPNANCNFSM4INMZ63A

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUKVVT2SMBOSEW674ADQFRYRHA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4X4TNI#issuecomment-523225525, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKHDUOXFFUDKZVIGAU7CA3QFRYRHANCNFSM4INMZ63A .

coreytcallaghan commented 5 years ago

Rplot

coreytcallaghan commented 5 years ago

Doesn't seem to be much going on... Unless I repeat the random sample at each percentage level 100 times or something? Then maybe the pattern will be a bit stronger?

coreytcallaghan commented 5 years ago

Then this might be helpful: https://stats.stackexchange.com/questions/5195/how-to-draw-funnel-plot-using-ggplot2-in-r

wcornwell commented 5 years ago

yeah should be stochastic so repeat each a bunch of times. can get this on katana if you want.

coreytcallaghan commented 5 years ago

yeah. let me tweak the code and run another test or two, then you can katana it :)

On Wed, Aug 21, 2019 at 11:49 AM Will Cornwell notifications@github.com wrote:

yeah should be stochastic so repeat each a bunch of times. can get this on katana if you want.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=AGWSEJRXI46KG74LQB5UZELQFSNJXA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4YFMYQ#issuecomment-523261538, or mute the thread https://github.com/notifications/unsubscribe-auth/AGWSEJSYMO4FE3IV2Z2OLGLQFSNJXANCNFSM4INMZ63A .

coreytcallaghan commented 5 years ago

Getting there...

this is 100 runs for the top thirty species that finished overnight on my machine.

Rplot

coreytcallaghan commented 5 years ago

This is as far as I've gotten. Still can't figure out how to make nice changing 95% confidence intervals. This code didn't seem to work for me, and just makes really small confidence intervals.

# packages
library(ggplot2)
library(dplyr)
library(purrr)

setwd("Data/permutation_sensitivity/top_ten_species")

df_one_to_ten <- list.files(pattern = ".RDS") %>%
  map(readRDS) %>% 
  bind_rows()

setwd("..")

setwd("eleven_to_twenty_species")

df_eleven_to_twenty <- list.files(pattern = ".RDS") %>%
  map(readRDS) %>% 
  bind_rows()

setwd("..")

setwd("twentyone_to_thirty_species")

df_twentyone_to_thirty <- list.files(pattern = ".RDS") %>%
  map(readRDS) %>% 
  bind_rows()

setwd("..")

data <- bind_rows(df_one_to_ten,
                  df_eleven_to_twenty,
                  df_twentyone_to_thirty)

# remove all models which did not converge!
data2 <- data %>%
  dplyr::filter(mod_converged=="TRUE") %>%
  group_by(COMMON_NAME) %>%
  mutate(upper_quartile=quantile(slope_date, 0.75)) %>%
  mutate(lower_quartile=quantile(slope_date, 0.25)) %>%
  mutate(mean_slope=mean(slope_date)) %>%
  mutate(sd_slope=sd(slope_date)) %>%
  mutate(greater_than_2_sd_top=ifelse(slope_date >= mean_slope+sd_slope+sd_slope, "TRUE", "FALSE")) %>%
  mutate(lesser_than_2_sd_bottom=ifelse(slope_date <=mean_slope-sd_slope-sd_slope, "TRUE", "FALSE")) %>%
  mutate(within_2_sd = ifelse(greater_than_2_sd_top == "TRUE" | lesser_than_2_sd_bottom == "TRUE", "FALSE", "TRUE")) %>%
  dplyr::filter(within_2_sd == "TRUE")

ggplot(data2, aes(x=percent_of_sample, y=slope_date))+
  geom_jitter()+
  facet_wrap(~COMMON_NAME, scales="free")

wcornwell commented 5 years ago

[image: image.png]

On Thu, Aug 22, 2019 at 8:45 AM Corey Callaghan notifications@github.com wrote:

This is as far as I've gotten. Still can't figure out how to make nice changing 95% confidence intervals. This code https://stats.stackexchange.com/questions/5195/how-to-draw-funnel-plot-using-ggplot2-in-r didn't seem to work for me, and just makes really small confidence intervals.

packages

library(ggplot2) library(dplyr) library(purrr)

setwd("Data/permutation_sensitivity/top_ten_species")

df_one_to_ten <- list.files(pattern = ".RDS") %>% map(readRDS) %>% bind_rows()

setwd("..")

setwd("eleven_to_twenty_species")

df_eleven_to_twenty <- list.files(pattern = ".RDS") %>% map(readRDS) %>% bind_rows()

setwd("..")

setwd("twentyone_to_thirty_species")

df_twentyone_to_thirty <- list.files(pattern = ".RDS") %>% map(readRDS) %>% bind_rows()

setwd("..")

data <- bind_rows(df_one_to_ten, df_eleven_to_twenty, df_twentyone_to_thirty)

remove all models which did not converge!

data2 <- data %>% dplyr::filter(mod_converged=="TRUE") %>% group_by(COMMON_NAME) %>% mutate(upper_quartile=quantile(slope_date, 0.75)) %>% mutate(lower_quartile=quantile(slope_date, 0.25)) %>% mutate(mean_slope=mean(slope_date)) %>% mutate(sd_slope=sd(slope_date)) %>% mutate(greater_than_2_sd_top=ifelse(slope_date >= mean_slope+sd_slope+sd_slope, "TRUE", "FALSE")) %>% mutate(lesser_than_2_sd_bottom=ifelse(slope_date <=mean_slope-sd_slope-sd_slope, "TRUE", "FALSE")) %>% mutate(within_2_sd = ifelse(greater_than_2_sd_top == "TRUE" | lesser_than_2_sd_bottom == "TRUE", "FALSE", "TRUE")) %>% dplyr::filter(within_2_sd == "TRUE")

ggplot(data2, aes(x=percent_of_sample, y=slope_date))+ geom_jitter()+ facet_wrap(~COMMON_NAME, scales="free")

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coreytcallaghan/optimize_citizen_science_obs/issues/17?email_source=notifications&email_token=ABKHDUID3MIGJGNKG3WDTALQFXAPBA5CNFSM4INMZ63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD43LC5Y#issuecomment-523678071, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKHDUMWR4LQXHG3726QKRTQFXAPBANCNFSM4INMZ63A .

-- Associate Professor, School of Biological, Earth and Environmental Sciences, UNSW www.willcornwell.org @will_cornwell

wcornwell commented 5 years ago

This isn't perfect, but is it good enough?

wcornwell commented 5 years ago

or:

wcornwell commented 5 years ago

or wiggly:

coreytcallaghan commented 5 years ago

Yes, those I like! Probably the straight one...

Then, we just need some some sort of definition where the data converge, and calculate that for each species. Can then present the mean "number of checklists" which I think is what the reviewer is after, and compare that number with 10,000 from Horns et al. 2018.

wcornwell commented 5 years ago

less wiggly:

wcornwell commented 5 years ago

maybe should change x-axis to raw number of checklists?

also need to think about units for y-axis. Is there a way to make those intuitive?

wcornwell commented 5 years ago

code:

library(quantreg)
filter(data2,slope_date > -10^9) %>%
ggplot(aes(x=percent_of_sample, y=slope_date))+
  geom_point(col="grey",alpha=0.1)+
  geom_quantile(quantiles=c(0.05,0.95),method="rqss",formula=y ~ qss(x, lambda = 10))+
#  geom_density_2d()+
  theme_classic() +
  facet_wrap(~COMMON_NAME, scales="free")