biocore / American-Gut

American Gut open-access data and IPython notebooks
Other
113 stars 81 forks source link

[WIP] Filtering notebook [WIP] #50

Closed adamrp closed 10 years ago

adamrp commented 10 years ago

Ready for merge!

adamrp commented 10 years ago

Needs test code and and parallel implementation

wasade commented 10 years ago

notebook seems reasonable on a quick pass, but would be good for at least one other person to eyeball

ElDeveloper commented 10 years ago

I met with @amnona today and we went over how the tests should look and the first few changes that need to happen before this can be merged.

@amnona mentioned that the IPython notebook was mainly under @adamrp's authorship and watch (friendly ping).

adamrp commented 10 years ago

Yes, I have the notebook, thanks for the reminder. For amnon's code, the test code can be included relatively easily since it's modular. However, did we settle on a good way to include test code for the actual ipython notebook itself?

On Mon, Jan 6, 2014 at 1:01 PM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

I met with @amnona https://github.com/amnona today and we went over how the tests should look and the first few changes that need to happen before this can be merged.

@amnona https://github.com/amnona mentioned that the IPython notebook was mainly under @adamrp https://github.com/adamrp's authorship and watch (friendly ping).

— Reply to this email directly or view it on GitHubhttps://github.com/qiime/American-Gut/pull/50#issuecomment-31681607 .

ElDeveloper commented 10 years ago

Oh I completely forgot about that. My two cents on this: it seems like the code in there is doing mostly what a workflow script interface would be doing. I would be ok with this remaining as it is as there's nothing in there that should be factored out (aside from steps 3 and 4 in the second to last chunk of code). If someone else agrees with this then have this added to a python module that can be tested (should be relatively simple). Otherwise I'm ok with everything the way it is right now. Alternatively writing a workflow script would then be the solution.

On Jan 6, 2014, at 1:08 PM, adamrp notifications@github.com wrote:

Yes, I have the notebook, thanks for the reminder. For amnon's code, the test code can be included relatively easily since it's modular. However, did we settle on a good way to include test code for the actual ipython notebook itself?

On Mon, Jan 6, 2014 at 1:01 PM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

I met with @amnona https://github.com/amnona today and we went over how the tests should look and the first few changes that need to happen before this can be merged.

@amnona https://github.com/amnona mentioned that the IPython notebook was mainly under @adamrp https://github.com/adamrp's authorship and watch (friendly ping).

— Reply to this email directly or view it on GitHubhttps://github.com/qiime/American-Gut/pull/50#issuecomment-31681607 .

— Reply to this email directly or view it on GitHub.

wasade commented 10 years ago

@adamrp are you aware of any reasons this should not be merged?

ElDeveloper commented 10 years ago

Would it be possible to move the function declared in the script to it's own module under americangut? If that's done, then this should be good to go.

adamrp commented 10 years ago

Sure, I can do that, and that will allow me to make test code, too. I will hopefully get to this today.

Adam

On Mon, Jan 13, 2014 at 9:45 AM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

Would it be possible to move the function declared in the script to it's own module under americangut? If that's done, then this should be good to go.

— Reply to this email directly or view it on GitHubhttps://github.com/qiime/American-Gut/pull/50#issuecomment-32186822 .

ElDeveloper commented 10 years ago

Great, thanks!

Yoshiki.

On Jan 13, 2014, at 10:28 AM, adamrp notifications@github.com wrote:

Sure, I can do that, and that will allow me to make test code, too. I will hopefully get to this today.

Adam

On Mon, Jan 13, 2014 at 9:45 AM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

Would it be possible to move the function declared in the script to it's own module under americangut? If that's done, then this should be good to go.

— Reply to this email directly or view it on GitHubhttps://github.com/qiime/American-Gut/pull/50#issuecomment-32186822 .

— Reply to this email directly or view it on GitHub.

wasade commented 10 years ago

Good to go?

adamrp commented 10 years ago

No not yet -- been swamped with AG help account stuff all week, but I finally answered all the emails!! At least for now... I'll try to get to this tomorrow.

Adam

On Wed, Jan 22, 2014 at 3:42 AM, Daniel McDonald notifications@github.comwrote:

Good to go?

— Reply to this email directly or view it on GitHubhttps://github.com/qiime/American-Gut/pull/50#issuecomment-33010523 .

wasade commented 10 years ago

Thanks

On Wed, Jan 22, 2014 at 4:23 PM, adamrp notifications@github.com wrote:

No not yet -- been swamped with AG help account stuff all week, but I finally answered all the emails!! At least for now... I'll try to get to this tomorrow.

Adam

On Wed, Jan 22, 2014 at 3:42 AM, Daniel McDonald notifications@github.comwrote:

Good to go?

— Reply to this email directly or view it on GitHub< https://github.com/qiime/American-Gut/pull/50#issuecomment-33010523> .

— Reply to this email directly or view it on GitHubhttps://github.com/qiime/American-Gut/pull/50#issuecomment-33080880 .

ElDeveloper commented 10 years ago

Friendly ping on this.

adamrp commented 10 years ago

Some of the steps in the filtering process are somewhat time consuming (there are several runs of closed reference OTU picking). I know @wasade added cluster_utils.ipy to the repository. Is it okay to refactor the notebook to use these utilities, or would it be wrong to assume that whoever might be running the notebook has access to a cluster using torque/qsub?

ElDeveloper commented 10 years ago

I don't think it would be wrong to assume access to a cluster.

Yoshiki Vázquez-Baeza

On Feb 4, 2014, at 2:39 PM, adamrp notifications@github.com wrote:

Some of the steps in the filtering process are somewhat time consuming (there are several runs of closed reference OTU picking). I know @wasade added cluster_utils.ipy to the repository. Is it okay to refactor the notebook to use these utilities, or would it be wrong to assume that whoever might be running the notebook has access to a cluster using torque/qsub?

— Reply to this email directly or view it on GitHub.

wasade commented 10 years ago

Please use them if you can.

On Tuesday, February 4, 2014, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

I don't think it would be wrong to assume access to a cluster.

Yoshiki Vázquez-Baeza

On Feb 4, 2014, at 2:39 PM, adamrp notifications@github.com<javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Some of the steps in the filtering process are somewhat time consuming (there are several runs of closed reference OTU picking). I know @wasade added cluster_utils.ipy to the repository. Is it okay to refactor the notebook to use these utilities, or would it be wrong to assume that whoever might be running the notebook has access to a cluster using torque/qsub?

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com/qiime/American-Gut/pull/50#issuecomment-34108413 .

ElDeveloper commented 10 years ago

@adamrp wzo1c

adamrp commented 10 years ago

Need to refactor to use the cluster utils. Will do this to process rounds 6 and 7. On Feb 16, 2014 8:28 PM, "Yoshiki Vázquez Baeza" notifications@github.com wrote:

@adamrp https://github.com/adamrp [image: wzo1c]https://f.cloud.github.com/assets/375307/2182280/81921126-9783-11e3-8a05-264473f1359b.gif

Reply to this email directly or view it on GitHubhttps://github.com/qiime/American-Gut/pull/50#issuecomment-35226340 .

ElDeveloper commented 10 years ago

Cool, thanks!

adamrp commented 10 years ago

I followed the model in module2.ipynb for parallel support, and I think it should be ready, but I need to test. @wasade, if you can have a look and make some comments, I can test and rework if necessary.

wasade commented 10 years ago

Thanks, @adamrp. Checking righ tnow

wasade commented 10 years ago

Seems good, thanks for doing this so fast. Assuming it tests out then I think its good to go

adamrp commented 10 years ago

Thanks! Should I change those calls? Not passing as a list relies on a line in cluster_utils.ipy labeled "fragile"...

On Mon, Feb 17, 2014 at 3:40 PM, Daniel McDonald notifications@github.comwrote:

Seems good, thanks for doing this so fast. Assuming it tests out then I think its good to go

Reply to this email directly or view it on GitHubhttps://github.com/biocore/American-Gut/pull/50#issuecomment-35329788 .

wasade commented 10 years ago

hahaha... meh, need

On Mon, Feb 17, 2014 at 4:17 PM, adamrp notifications@github.com wrote:

Thanks! Should I change those calls? Not passing as a list relies on a line in cluster_utils.ipy labeled "fragile"...

On Mon, Feb 17, 2014 at 3:40 PM, Daniel McDonald notifications@github.comwrote:

Seems good, thanks for doing this so fast. Assuming it tests out then I think its good to go

Reply to this email directly or view it on GitHub< https://github.com/biocore/American-Gut/pull/50#issuecomment-35329788>

.

Reply to this email directly or view it on GitHubhttps://github.com/biocore/American-Gut/pull/50#issuecomment-35332184 .

adamrp commented 10 years ago

Added filtering based on 100-nt sequences instead of the full length 150nt sequences. Lowered abundance threshold to 85%. Revised layout of code. Improved efficiency of main filtering step. Ready for review/merge!