biocore / American-Gut

American Gut open-access data and IPython notebooks
Other
108 stars 81 forks source link

Statistical power #127

Closed jwdebelius closed 9 years ago

jwdebelius commented 9 years ago

This is a cleaned up power notebook, loosely related to what was presented in #121

The rendered notebook can be found here.

The text is in Google docs, which may be a better place for text editing.

This notebook relies on code generated in #125 and on the data files generated when #126 is run.

wasade commented 9 years ago

@JWDebelius, there are merge conflicts

wasade commented 9 years ago

I can't link directly into the notebook, so below are the initial pass comments on the notebook itself. Note, it does not appear that the power_plots code or the diversity_analysis code is included in this PR, and which appear to be necessary for this code to work.

https://github.com/JWDebelius/American-Gut/blob/statistical_power/ipynb/Power.ipynb#L16

https://github.com/JWDebelius/American-Gut/blob/statistical_power/ipynb/Power.ipynb#L27

https://github.com/JWDebelius/American-Gut/blob/statistical_power/ipynb/Power.ipynb#L113

function imports, american_gut.power_plots does not appear to exist nor does americangut.diversity_analysis.

general: it is QIIME

alpha diversity parameters section, the beta diversity link is 404

In parameters for fecal samples, the ANTIBIOTIC_SELECT values are "In the past month" and "Not in the past year". Earlier in the document it indicates that "in the past week" will also included?

In the files and directories section, the download links to dropbox. Are these data not in the repo? Ah, or these precomputed distance matrices?

In data download, we should not source from what look like random dropbox accounts. If the volume is such that it doesn't make sense to host on github, we can drop the data on ftp.microbio.me.

In power calculation for fecal samples, the first two lines are not preallocating a list but creating an empty list.

In power calculation for fecal samples,

# Removes the category from the control cats, if necessary
if cat in fecal_control_cats:        
    ctrl_cats = copy.deepcopy(fecal_control_cats)
    ctrl_cats.remove(cat)

can be replaced with

# Removes the category from the control cats, if necessary
if cat in fecal_control_cats:        
    ctrl_cats = fecal_control_cats[:]
    ctrl_cats.remove(cat)

but since order does not appear to matter, it would be more efficient if fecal_control_cats was a set. This is likely in a block of code that is not a bottleneck, so just a minor comment.

effect size estimation, the capitalization of a lot of items in the table is wrong (e.g., Ibd)

Beta diversity section has a "[beta diversity discussion]", is that a place holder?

Unweighted unifrac is quite powerful? The text previously mentioned weighted unifrac as well, should there be a plot for that too?

jwdebelius commented 9 years ago

In the past week is a subset of in the past month. I will fix the text, but they're merged in the metadata section. But, I will update the text to reflect that.

The tables are precomputed distance matrices. I've been trying to find a better place to host them than my dropbox. My understanding was that I was to wait for Qiita to be released so the processed data could be hosted there. I'd be very happy to transfer them to some better location.

Re: fecal cats as a set: right now, the power code accepts a list because it goes into a pandas group by-object. ...Not sure if its more expensive to cast to a list every time than to copy and remove the item.

I've tried to address the capitalization by switching to labels. Im still waiting for the power notebook to run (I forgot my precomputed effect size matrices are also on the computer being repaired.)

Sorry, I will remove that placeholder.

Given the time constraints and the fact that most down stream analyses I've done have focused on unweighted UniFrac, I'd rather remove the reference to weighted UniFrac beyond a vague mention that it exists and you can use it, if you want to spend the compute. I can cite the Turnabugh paper suggesting that the presence/absence of pathway branches matters more than the core pathways.

wasade commented 9 years ago

Thanks @JWDebelius, will go back over again shortly.

wasade commented 9 years ago

I gotta take a break from this. Will try to get back to this soon -- halfway through.

wasade commented 9 years ago

i believe i'm done with the static analysis

ElDeveloper commented 9 years ago

I cannot review Alph Diversity.ipynb the web app is not displaying this file. So, I am not going to review that file, sorry.

wasade commented 9 years ago

@JWDebelius, can you please fix that notebook?

On Wed, Mar 4, 2015 at 12:30 PM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

I cannot review Alph Diversity.ipynb the web app is not displaying this file. So, I am not going to review that file, sorry.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/127#issuecomment-77228341.

ElDeveloper commented 9 years ago

I don't think this is a problem with the notebook itself, but with the fact that there's a maximum number of diffs that GitHub will render for a given PR. In this case I think that maximum was exceeded. Really what would have needed to happen is for this PR to have been split in more manageable groups of changesets. In addition that would have also made the reviewing process easier; it is usually advised to not have large PRs.

On (Mar-04-15|12:08), Daniel McDonald wrote:

@JWDebelius, can you please fix that notebook?

On Wed, Mar 4, 2015 at 12:30 PM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

I cannot review Alph Diversity.ipynb the web app is not displaying this file. So, I am not going to review that file, sorry.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/127#issuecomment-77228341.


Reply to this email directly or view it on GitHub: https://github.com/biocore/American-Gut/pull/127#issuecomment-77235820

jwdebelius commented 9 years ago

Both the ways i've tried to view in nbviewer have rendered fine. Is the issue with Git and not the viewer?

This was originally split into two pull requests (see #125).

wasade commented 9 years ago

Both were still extremely large, and I did not know that the inclusion of the necessary code for #127 to be complete would double its size

On Wed, Mar 4, 2015 at 2:00 PM, J W Debelius notifications@github.com wrote:

Both the ways i've tried to view in nbviewer http://nbviewer.ipython.org/github/JWDebelius/American-Gut/blob/statistical_power/ipynb/Alpha%20diversity%20notebook.ipynb have rendered fine.

@ElDeveloper https://github.com/ElDeveloper: This was originally split into two pull requests (see #125 https://github.com/biocore/American-Gut/pull/125). I was asked to combine them.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/127#issuecomment-77246198.

ElDeveloper commented 9 years ago

What I mean is that github is not rendering the notebook and thus I cannot comment on the file.

On (Mar-04-15|13:00), J W Debelius wrote:

Both the ways i've tried to view in nbviewer have rendered fine.

@ElDeveloper: This was originally split into two pull requests (see #125). I was asked to combine them.


Reply to this email directly or view it on GitHub: https://github.com/biocore/American-Gut/pull/127#issuecomment-77246198

jwdebelius commented 9 years ago

Would it work to comment on #125 and then include the changes here? I realize that may not be an appropriate use of the technology, but it might get around the size problem.

ElDeveloper commented 9 years ago

The problem is that once the modifications are made in one place, there's not a good way to verify/comment on subsequent changes.

On (Mar-04-15|13:16), J W Debelius wrote:

Would it work to comment on #125 and then include the changes here? I realize that may not be an appropriate use of the technology, but it might get around the size problem.


Reply to this email directly or view it on GitHub: https://github.com/biocore/American-Gut/pull/127#issuecomment-77249777

jwdebelius commented 9 years ago

@wasade, I've updated the code as per your suggestions and added the new text to the notebook.

wasade commented 9 years ago

Still a few typos I saw with 'significance', and the trace_bounds method has some method calls that are currently not doing anything. Going over the rendered notebook now

wasade commented 9 years ago

@ElDeveloper, were you able to go over the other notebook by chance?

ElDeveloper commented 9 years ago

No, I wasn't able to go over the other notebook.

On (Mar-05-15|15:57), Daniel McDonald wrote:

@ElDeveloper, were you able to go over the other notebook by chance?


Reply to this email directly or view it on GitHub: https://github.com/biocore/American-Gut/pull/127#issuecomment-77479197

wasade commented 9 years ago

Just on a quick pass through, the rendered notebook still has grammar and spelling issues to be resolved. Where the suggestions that @cuttlefishh and @EmbrietteH made to the google doc merged in?

wasade commented 9 years ago

Possible to do so?

On Thu, Mar 5, 2015 at 5:00 PM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

No, I wasn't able to go over the other notebook.

On (Mar-05-15|15:57), Daniel McDonald wrote:

@ElDeveloper, were you able to go over the other notebook by chance?


Reply to this email directly or view it on GitHub: https://github.com/biocore/American-Gut/pull/127#issuecomment-77479197

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/127#issuecomment-77479607.

jwdebelius commented 9 years ago

I tested the code by running it through both notebooks, and the test suite. The suggestions were merged in last night. Will run through spell check again.

wasade commented 9 years ago

From what I'm seeing, the notebook text does not appear to be in sync with the google doc. I mashed refresh a few times, so it shouldn't be a cached copy

On Thu, Mar 5, 2015 at 5:12 PM, J W Debelius notifications@github.com wrote:

I tested the code by running it through both notebooks, and the test suite. The suggestions were merged in last night. Will run through spell check again.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/127#issuecomment-77481062.

jwdebelius commented 9 years ago

That is the updated notebook and text. What I'm seeing lines up with the google doc, when I click both links.

On Thu, Mar 5, 2015 at 4:17 PM, Daniel McDonald notifications@github.com wrote:

From what I'm seeing, the notebook text does not appear to be in sync with the google doc. I mashed refresh a few times, so it shouldn't be a cached copy

On Thu, Mar 5, 2015 at 5:12 PM, J W Debelius notifications@github.com wrote:

I tested the code by running it through both notebooks, and the test suite. The suggestions were merged in last night. Will run through spell check again.

— Reply to this email directly or view it on GitHub <https://github.com/biocore/American-Gut/pull/127#issuecomment-77481062 .

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/127#issuecomment-77481747.

ElDeveloper commented 9 years ago

This is the reason why I can't comment on the notebook: https://github.com/biocore/American-Gut/pull/127/files#diff-c6b18319a5fc61e394924c6a0a3ff2d3

GitHub will not show the interface for me to make comments on it.

On (Mar-05-15|16:10), Daniel McDonald wrote:

Possible to do so?

On Thu, Mar 5, 2015 at 5:00 PM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

No, I wasn't able to go over the other notebook.

On (Mar-05-15|15:57), Daniel McDonald wrote:

@ElDeveloper, were you able to go over the other notebook by chance?


Reply to this email directly or view it on GitHub: https://github.com/biocore/American-Gut/pull/127#issuecomment-77479197

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/127#issuecomment-77479607.


Reply to this email directly or view it on GitHub: https://github.com/biocore/American-Gut/pull/127#issuecomment-77480879

wasade commented 9 years ago

@JWDebelius, please see the screenshots below. There are others as well

@ElDeveloper, able to comment separate from inline on github? It's not ideal, but we need to review the material somehow

screen shot 2015-03-05 at 5 26 48 pm screen shot 2015-03-05 at 5 27 00 pm

jwdebelius commented 9 years ago

I've updated. I can ask someone else to check through spelling again before you give another pass.

Also, I processed the data for the alpha diversity on a pre-computed tables, since the current maps are missing preceding zeros. This skips the download step, since the notebook requires the github directory, but assumes the kernel is running in the ipnyb directory.

wasade commented 9 years ago

Thanks. I'm going to do one more pass through today. I think we're nearly or already there, so it should go quick. And its not like we can't issue future PRs to update if necessary anyway. Thank you for all the hard work on this!!

jwdebelius commented 9 years ago

I asked an extra person to look through it. I thought they'd be done tonight, but they sent me a couple more edits that I'm working on now. Sorry for the lengthy process.

jwdebelius commented 9 years ago

Okay, I got the new stuff in, too. Should be good.

wasade commented 9 years ago

Sorry, didn't see the comment until now. Reviewing

wasade commented 9 years ago

I think this is good, thanks @JWDebelius!!!!