biocore / American-Gut

American Gut open-access data and IPython notebooks
Other
108 stars 81 forks source link

Power #121

Closed jwdebelius closed 9 years ago

jwdebelius commented 9 years ago

The power analysis notebook. Expanded notebook text is on google docs.

wasade commented 9 years ago

If the plotting code, which is the bulk of this PR, is generally useful, why not push them to skbio?

wasade commented 9 years ago

The google doc text is not insync with the notebook, which should we comment on?

jwdebelius commented 9 years ago

Please comment on the google doc, that's a later version.

jwdebelius commented 9 years ago

Would anyone else find the plotting code useful?

wasade commented 9 years ago

Okay, thanks.

If its useful to you, it's probably useful to someone else too.

I can't comment inline on the notebook code as github would let it expand out so I'll just start adding comments here.

wasade commented 9 years ago

in collate_effect_size:

    if isinstance(powers, np.ndarray):
        powers = [powers]

    num_powers = len(powers)

num_powers will always be 1 if an array is passed in, which is the expected type according to the doc string

wasade commented 9 years ago
                if np.isnan(pwr[id2]):
                    eff[id2] = np.nan
                    continue
                try:
                    eff[id2] = ft.solve_power(effect_size=None,
                                              nobs=cnt,
                                              alpha=alpha,
                                              power=pwr[id2])
                except:
                    eff[id2] = np.nan

Why not:


                eff[id2] = np.nan
                try:
                    eff[id2] = ft.solve_power(effect_size=None,
                                              nobs=cnt,
                                              alpha=alpha,
                                              power=pwr[id2])
                except:
                    pass

Also, what is the expected exception?

wasade commented 9 years ago

Why is plot_effects in the notebook, whereas all the other plotting code is in modules?

wasade commented 9 years ago

How do the downloaded biom tables differ from what is actually provided by the repo?

jwdebelius commented 9 years ago

Downloaded biom tables have been run through the preprocessing notebook (#120)

jwdebelius commented 9 years ago

Plotting effects is only used here. I originally wrote it for skbio, but it relies on statsmodels.

wasade commented 9 years ago
    all_ids = np.array([])
    for id_ in ids:
        all_ids = np.hstack((all_ids, id_))

Why not just do all_ids = array(ids)?

wasade commented 9 years ago

two ~5k PRs? these are titanic and look to overlap a lot. Which one should be reviewed and merged first?

jwdebelius commented 9 years ago

They're independent? I'd argue pre-processing relies on power at this point. The original plan was that the diversity analysis code was supposed to be its own pull request, but I forgot to delete them when I forked from my branch with all my notebooks.

wasade commented 9 years ago

Please only include code in the PR that is relevant. It is extremely difficult to review a 5k line PR

ElDeveloper commented 9 years ago

Agreed, there's no point on having code that doesn't belong to the pull request, even if it will eventually be merged. Having extra code makes the review process a lot harder.

josenavas commented 9 years ago

Can this notebook be renderer somewhere? It will make the review way easier.