Subsetting data in comparing results

arisp99 commented 2 years ago

Comparison of coiaf and The REAL McCOIL

In comparing the results of {coiaf} and The REAL McCOIL, we remove some patients in the following lines:

https://github.com/bailey-lab/coiaf-real-data/blob/172cb99984213d553dab5b25b342bf8a00ccec40/analysis/pf6_analysis.Rmd#L75-L81

This is done as in some cases we were missing predictions from The REAL McCOIL. We also remove two samples in which the The REAL McCOIL prediction was clearly wrong (the predicted value was 25). This data filtering step is justified for comparing the two software packages.

COI across the world

When we then use our results to plot the COI across the world, we continue to use our subsetted data.

https://github.com/bailey-lab/coiaf-real-data/blob/172cb99984213d553dab5b25b342bf8a00ccec40/analysis/pf6_analysis.Rmd#L265-L272

The variable patient_lat_long is what we base the rest of our results on. Given that this data is a subset of our results and we are no longer interested in the comparison between the two populations, it seems that it would be better to use our estimations for the complete patient population we had access to.

Prevalence and FwS

For examing the relationship of COI to the prevalence of malaria and FwS, we can continue to use the subsetted data set as we compare {coiaf} to The REAL McCOIL. However, when we plot the ridge plot with the prevalence in each of our 24 regions, we should use the entire data set. Especially because our subsetted data does not cover region 11.

JeffAndBailey commented 2 years ago

I think these issues with TRMcC should be mentioned though in the text.

On 1/27/22 18:01, Aris Paschalidis wrote:

Comparison of /coiaf/ and /The REAL McCOIL/
In comparing the results of |{coiaf}| and /The REAL McCOIL/, we remove some patients in the following lines:

https://github.com/bailey-lab/coiaf-real-data/blob/172cb99984213d553dab5b25b342bf8a00ccec40/analysis/pf6_analysis.Rmd#L75-L81

This is done as in some cases we were missing predictions from /The REAL McCOIL/. We also remove two samples in which the /The REAL McCOIL/ prediction was clearly wrong (the predicted value was 25). This data filtering step is justified for comparing the two software packages.
COI across the world
When we then use our results to plot the COI across the world, we continue to use our subsetted data.

https://github.com/bailey-lab/coiaf-real-data/blob/172cb99984213d553dab5b25b342bf8a00ccec40/analysis/pf6_analysis.Rmd#L265-L272

The variable |patient_lat_long| is what we base the rest of our results on. Given that this data is a subset of our results and we are no longer interested in the comparison between the two populations, it seems that it would be better to use our estimations for the complete patient population we had access to.
Prevalence and /FwS/
For examing the relationship of COI to the prevalence of malaria and /FwS/, we can continue to use the subsetted data set as we compare |{coiaf}| to /The REAL McCOIL/. However, when we plot the ridge plot with the prevalence in each of our 24 regions, we should use the entire data set. Especially because our subsetted data does not cover region 11.

— Reply to this email directly, view it on GitHub https://github.com/bailey-lab/coiaf-real-data/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEXH6YIQMMG3N2QNNGPCWOLUYHFFFANCNFSM5M7IIRIQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

arisp99 commented 2 years ago

Agreed, this should likely be mentioned.

bailey-lab / coiaf-real-data