biocore / American-Gut

American Gut open-access data and IPython notebooks
Other
108 stars 81 forks source link

All new PCoAs and legends for bottom half of Module 2 report, includi… #150

Closed cuttlefishh closed 9 years ago

cuttlefishh commented 9 years ago

All new PCoAs and legends for bottom half of Module 2 report, including latex and matplotlib code. Double-check that pipeline-generated macros files still work (I had to change the dummy macros_gut.tex file but I think this was due to changes already in the repo). Still need to tweak a few pieces of the latex and python script. Also need to add the actual rendering of the PCoAs via the pipeline, making sure the right options and file paths are invoked.

ElDeveloper commented 9 years ago

Minor, but we should change the script to use spaces instead of tabs for indentation.

On (May-15-15|18:31), Luke Thompson wrote:

All new PCoAs and legends for bottom half of Module 2 report, including latex and matplotlib code. Double-check that pipeline-generated macros files still work (I had to change the dummy macros_gut.tex file but I think this was due to changes already in the repo). Still need to tweak a few pieces of the latex and python script. Also need to add the actual rendering of the PCoAs via the pipeline, making sure the right options and file paths are invoked. You can view, comment on, or merge this pull request online at:

https://github.com/biocore/American-Gut/pull/150

-- Commit Summary --

  • All new PCoAs and legends for bottom half of Module 2 report, including latex and matplotlib code

-- File Changes --

M latex/macros_gut.tex (12)
M latex/pdfs-gut/figure1.pdf (0)
A latex/pdfs-gut/figure1_legend.ai (416)
M latex/pdfs-gut/figure1_legend.pdf (0)
M latex/pdfs-gut/figure2.pdf (0)
A latex/pdfs-gut/figure2_legend.ai (365)
M latex/pdfs-gut/figure2_legend.pdf (0)
M latex/pdfs-gut/figure3.pdf (0)
A latex/pdfs-gut/figure3_legend.ai (451)
M latex/pdfs-gut/figure3_legend.pdf (0)
A latex/pdfs-gut/figure3_legend_double_gradient.ai (451)
A latex/pdfs-gut/figure3_legend_double_gradient.pdf (0)
M latex/template_gut.tex (33)
A scripts/mod2_pcoa.py (276)

-- Patch Links --

https://github.com/biocore/American-Gut/pull/150.patch https://github.com/biocore/American-Gut/pull/150.diff


Reply to this email directly or view it on GitHub: https://github.com/biocore/American-Gut/pull/150

cuttlefishh commented 9 years ago

Cool. I will include spaces (not tabs) when I push the next set of edits.

On May 17, 2015, at 1:25 PM, Yoshiki Vázquez Baeza notifications@github.com wrote:

Minor, but we should change the script to use spaces instead of tabs for indentation.

On (May-15-15|18:31), Luke Thompson wrote:

All new PCoAs and legends for bottom half of Module 2 report, including latex and matplotlib code. Double-check that pipeline-generated macros files still work (I had to change the dummy macros_gut.tex file but I think this was due to changes already in the repo). Still need to tweak a few pieces of the latex and python script. Also need to add the actual rendering of the PCoAs via the pipeline, making sure the right options and file paths are invoked. You can view, comment on, or merge this pull request online at:

https://github.com/biocore/American-Gut/pull/150

-- Commit Summary --

  • All new PCoAs and legends for bottom half of Module 2 report, including latex and matplotlib code

-- File Changes --

M latex/macros_gut.tex (12) M latex/pdfs-gut/figure1.pdf (0) A latex/pdfs-gut/figure1_legend.ai (416) M latex/pdfs-gut/figure1_legend.pdf (0) M latex/pdfs-gut/figure2.pdf (0) A latex/pdfs-gut/figure2_legend.ai (365) M latex/pdfs-gut/figure2_legend.pdf (0) M latex/pdfs-gut/figure3.pdf (0) A latex/pdfs-gut/figure3_legend.ai (451) M latex/pdfs-gut/figure3_legend.pdf (0) A latex/pdfs-gut/figure3_legend_double_gradient.ai (451) A latex/pdfs-gut/figure3_legend_double_gradient.pdf (0) M latex/template_gut.tex (33) A scripts/mod2_pcoa.py (276)

-- Patch Links --

https://github.com/biocore/American-Gut/pull/150.patch https://github.com/biocore/American-Gut/pull/150.diff


Reply to this email directly or view it on GitHub: https://github.com/biocore/American-Gut/pull/150 — Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-102848142.

wasade commented 9 years ago

The spaces/tabs thing is required, it plays hell with the codebase if its not consistent. There are a large number of pep-8 failures btw

wasade commented 9 years ago

Able to address the flake8 issues?

cuttlefishh commented 9 years ago

I might need some assistance on the pep-8/flake8 issues. Maybe @ElDeveloper can help this morning.

ElDeveloper commented 9 years ago

I can help you out!

On (May-18-15| 8:31), Luke Thompson wrote:

I might need some assistance on the pep-8/flake8 issues. Maybe @ElDeveloper can help this morning.


Reply to this email directly or view it on GitHub: https://github.com/biocore/American-Gut/pull/150#issuecomment-103105345

cuttlefishh commented 9 years ago

I'm working on it. Didn't know what flake8 was, but Zech told me, I pip installed that shiznit, and am working through it now. Cool way to clean up code!

cuttlefishh commented 9 years ago

OK, the latest updates should fix all known issues:

Thanks to @ElDeveloper and @amnona for assistance and suggestions. @JWDebelius suggested to use alpha diversity instead of firmicutes for Figure 3, but that may have to wait (couldn't find it in any mapping file).

wasade commented 9 years ago

two flake8 issues still FYI

wasade commented 9 years ago

This looks great though. Getting the processing notebook setup to support it now.

cuttlefishh commented 9 years ago

I don't get any flake8 issues when I run it on my laptop. Is there some version of flake8 that github runs?

wasade commented 9 years ago

Should be using the latest version as no version is specified. Can you try doing:

pip install -U flake8

...and then rerun?

cuttlefishh commented 9 years ago

I ran that update, updated from 2.4.0 to 2.4.1, but still no issues identified. Are you just running with default options?

On May 19, 2015, at 9:25 AM, Daniel McDonald notifications@github.com wrote:

Should be using the latest version as no version is specified. Can you try doing:

pip install -U flake8 ...and then rerun?

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-103578436.

josenavas commented 9 years ago

Just a note, I've seen this behavior, when I can't reproduce the flake8 errors locally. I think it's because the flake8 version installed with conda is different from the one in pip, but this is just an idea, no tests....

wasade commented 9 years ago

@cuttlefishh, just look at the travis output in the PR. It'll show the reason the build fails

cuttlefishh commented 9 years ago

OK, below is the line (used twice) creating the error. The error is "line break before binary operator". If I put it on the same line, it's more than 80 characters. Not sure where I can break it legally. Other option is I'm sure I can make the line shorter using a logical "or" as in "Malawi or Venezuela", but I am still new to Python logical structures. Any suggestions from the experts?

    if (mf.loc[sample]['COUNTRY'] != 'Malawi') & (mf.loc[sample]['COUNTRY']
                                                  != 'Venezuela'):

On May 19, 2015, at 10:15 AM, Daniel McDonald notifications@github.com wrote:

@cuttlefishh https://github.com/cuttlefishh, just look at the travis output in the PR. It'll show the reason the build fails

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-103598759.

josenavas commented 9 years ago
if ((mf.loc[sample]['COUNTRY'] != 'Malawi') &
        (mf.loc[sample]['COUNTRY'] != 'Venezuela')):
cuttlefishh commented 9 years ago

Thanks Jose, but that gives me an "invalid syntax" error.

On May 19, 2015, at 10:25 AM, josenavas notifications@github.com wrote:

if (mf.loc[sample]['COUNTRY'] != 'Malawi') & (mf.loc[sample]['COUNTRY'] != 'Venezuela'): — Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-103601974.

cuttlefishh commented 9 years ago

This does not give me an error, but still don't know how it will go on github.

    if (mf.loc[sample]['COUNTRY'] != 'Malawi') & (
            mf.loc[sample]['COUNTRY'] != 'Venezuela'):

On May 19, 2015, at 10:28 AM, Luke Thompson lukethompson@gmail.com wrote:

Thanks Jose, but that gives me an "invalid syntax" error.

On May 19, 2015, at 10:25 AM, josenavas <notifications@github.com mailto:notifications@github.com> wrote:

if (mf.loc[sample]['COUNTRY'] != 'Malawi') & (mf.loc[sample]['COUNTRY'] != 'Venezuela'): — Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-103601974.

josenavas commented 9 years ago

That should make it. Note that I've updated my comment just after I send it, cause I saw the syntax error (I just wrapped everything in (), but your current solution shouldn't be an issue.

cuttlefishh commented 9 years ago

Cool, thanks! Still learning :)

On May 19, 2015, at 10:37 AM, josenavas notifications@github.com wrote:

That should make it. Note that I've updated my comment just after I send it, cause I saw the syntax error (I just wrapped everything in (), but your current solution shouldn't be an issue.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-103609202.

wasade commented 9 years ago

Ah, damn. Merged this prematurely. When looking at integrating, there are two things missing. First is that it needs an output directory. Second, it needs the ability to specify a prefix for the output files. I'll issue a separate PR on that shortly though.

cuttlefishh commented 9 years ago

Yeah, I discussed that with @eldeveloper but we decided you would figure it out, as we didn't know how it would fit in with the rest of the processing pipeline. Thanks!

On May 19, 2015, at 3:47 PM, Daniel McDonald notifications@github.com wrote:

Ah, damn. Merged this prematurely. When looking at integrating, there are two things missing. First is that it needs an output directory. Second, it needs the ability to specify a prefix for the output files. I'll issue a separate PR on that shortly though.

— Reply to this email directly or view it on GitHub.

wasade commented 9 years ago

Please describe incomplete PRs as a work in progress and explicitly enumerate what needs to be done in these instances.

What version of skbio did you write this against? The methods used do not exist.

cuttlefishh commented 9 years ago

scikit-bio==0.2.3

wasade commented 9 years ago

How difficult would it be to specify what samples are printed when called? Issue is that this script is slow and cannot be parallelized in its current form (at least, specifying a reduced mapping file causes it to break). I'm currently projecting 16 hours to produce the results for figure 1.

wasade commented 9 years ago

nm, ill adjust it

cuttlefishh commented 9 years ago

Yeah, we realized that it wasn't necessary to generate reports for HMP and GG, but didn't institute any way to filter those out. Sounds like you got it working.

On May 20, 2015, at 2:05 PM, Daniel McDonald notifications@github.com wrote:

nm, ill adjust it

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065.

wasade commented 9 years ago

No, still can't get the code to actually run yet...

On Wed, May 20, 2015 at 3:14 PM, Luke Thompson notifications@github.com wrote:

Yeah, we realized that it wasn't necessary to generate reports for HMP and GG, but didn't institute any way to filter those out. Sounds like you got it working.

On May 20, 2015, at 2:05 PM, Daniel McDonald notifications@github.com wrote:

nm, ill adjust it

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065>.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042.

cuttlefishh commented 9 years ago

If speed of Figure 1 is the slowest part, you can speed that up by changing "50" in lines 60 and 63 to "20". That will slice each category into 20 slices instead of 50, so there are fewer sets to plot.

On May 20, 2015, at 2:15 PM, Daniel McDonald notifications@github.com wrote:

No, still can't get the code to actually run yet...

On Wed, May 20, 2015 at 3:14 PM, Luke Thompson notifications@github.com wrote:

Yeah, we realized that it wasn't necessary to generate reports for HMP and GG, but didn't institute any way to filter those out. Sounds like you got it working.

On May 20, 2015, at 2:05 PM, Daniel McDonald notifications@github.com wrote:

nm, ill adjust it

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065>.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289.

wasade commented 9 years ago

If that decreases the runtime by 10x then it would be acceptable, what's your take there?

On Wed, May 20, 2015 at 3:20 PM, Luke Thompson notifications@github.com wrote:

If speed of Figure 1 is the slowest part, you can speed that up by changing "50" in lines 60 and 63 to "20". That will slice each category into 20 slices instead of 50, so there are fewer sets to plot.

On May 20, 2015, at 2:15 PM, Daniel McDonald notifications@github.com wrote:

No, still can't get the code to actually run yet...

On Wed, May 20, 2015 at 3:14 PM, Luke Thompson <notifications@github.com

wrote:

Yeah, we realized that it wasn't necessary to generate reports for HMP and GG, but didn't institute any way to filter those out. Sounds like you got it working.

On May 20, 2015, at 2:05 PM, Daniel McDonald < notifications@github.com> wrote:

nm, ill adjust it

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289>.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104042759.

cuttlefishh commented 9 years ago

Let me test that quickly on my laptop (I'm assuming it will scale on pando/compy). We can reduce the number of slices even lower, but as we approach 1 we start to get points from one group (e.g. HMP fecal) covering up those from another group (e.g. AG fecal).

On May 20, 2015, at 2:21 PM, Daniel McDonald notifications@github.com wrote:

If that decreases the runtime by 10x then it would be acceptable, what's your take there?

On Wed, May 20, 2015 at 3:20 PM, Luke Thompson notifications@github.com wrote:

If speed of Figure 1 is the slowest part, you can speed that up by changing "50" in lines 60 and 63 to "20". That will slice each category into 20 slices instead of 50, so there are fewer sets to plot.

On May 20, 2015, at 2:15 PM, Daniel McDonald notifications@github.com wrote:

No, still can't get the code to actually run yet...

On Wed, May 20, 2015 at 3:14 PM, Luke Thompson <notifications@github.com

wrote:

Yeah, we realized that it wasn't necessary to generate reports for HMP and GG, but didn't institute any way to filter those out. Sounds like you got it working.

On May 20, 2015, at 2:05 PM, Daniel McDonald < notifications@github.com> wrote:

nm, ill adjust it

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289>.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104042759.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104043086.

cuttlefishh commented 9 years ago

If we set it to 10 slices, it takes ~1.5 seconds per plot on my laptop. Compare that to ~3 seconds per plot for 50 slices. It still looks OK but not quite as well intercalated, and there's more plot-to-plot variation. That's a 2x gain, not 10x. Is that enough to be worth it?

On May 20, 2015, at 2:24 PM, Luke Thompson lukethompson@gmail.com wrote:

Let me test that quickly on my laptop (I'm assuming it will scale on pando/compy). We can reduce the number of slices even lower, but as we approach 1 we start to get points from one group (e.g. HMP fecal) covering up those from another group (e.g. AG fecal).

On May 20, 2015, at 2:21 PM, Daniel McDonald <notifications@github.com mailto:notifications@github.com> wrote:

If that decreases the runtime by 10x then it would be acceptable, what's your take there?

On Wed, May 20, 2015 at 3:20 PM, Luke Thompson <notifications@github.com mailto:notifications@github.com> wrote:

If speed of Figure 1 is the slowest part, you can speed that up by changing "50" in lines 60 and 63 to "20". That will slice each category into 20 slices instead of 50, so there are fewer sets to plot.

On May 20, 2015, at 2:15 PM, Daniel McDonald <notifications@github.com mailto:notifications@github.com> wrote:

No, still can't get the code to actually run yet...

On Wed, May 20, 2015 at 3:14 PM, Luke Thompson <notifications@github.com mailto:notifications@github.com

wrote:

Yeah, we realized that it wasn't necessary to generate reports for HMP and GG, but didn't institute any way to filter those out. Sounds like you got it working.

On May 20, 2015, at 2:05 PM, Daniel McDonald < notifications@github.com mailto:notifications@github.com> wrote:

nm, ill adjust it

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065 https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042 https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289 https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289>.

— Reply to this email directly or view it on GitHub <https://github.com/biocore/American-Gut/pull/150#issuecomment-104042759 https://github.com/biocore/American-Gut/pull/150#issuecomment-104042759>.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104043086.

wasade commented 9 years ago

Drops it to ~8 hours by present estimates. Figured out the juggling necessary to specify samples though, issuing a PR momentarily

On Wed, May 20, 2015 at 3:32 PM, Luke Thompson notifications@github.com wrote:

If we set it to 10 slices, it takes ~1.5 seconds per plot on my laptop. Compare that to ~3 seconds per plot for 50 slices. It still looks OK but not quite as well intercalated, and there's more plot-to-plot variation. That's a 2x gain, not 10x. Is that enough to be worth it?

On May 20, 2015, at 2:24 PM, Luke Thompson lukethompson@gmail.com wrote:

Let me test that quickly on my laptop (I'm assuming it will scale on pando/compy). We can reduce the number of slices even lower, but as we approach 1 we start to get points from one group (e.g. HMP fecal) covering up those from another group (e.g. AG fecal).

On May 20, 2015, at 2:21 PM, Daniel McDonald <notifications@github.com mailto:notifications@github.com> wrote:

If that decreases the runtime by 10x then it would be acceptable, what's your take there?

On Wed, May 20, 2015 at 3:20 PM, Luke Thompson < notifications@github.com mailto:notifications@github.com> wrote:

If speed of Figure 1 is the slowest part, you can speed that up by changing "50" in lines 60 and 63 to "20". That will slice each category into 20 slices instead of 50, so there are fewer sets to plot.

On May 20, 2015, at 2:15 PM, Daniel McDonald < notifications@github.com mailto:notifications@github.com> wrote:

No, still can't get the code to actually run yet...

On Wed, May 20, 2015 at 3:14 PM, Luke Thompson < notifications@github.com mailto:notifications@github.com

wrote:

Yeah, we realized that it wasn't necessary to generate reports for HMP and GG, but didn't institute any way to filter those out. Sounds like you got it working.

On May 20, 2015, at 2:05 PM, Daniel McDonald < notifications@github.com mailto:notifications@github.com> wrote:

nm, ill adjust it

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065 < https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065>>.

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042 < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042>>.

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289 < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289>>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104042759 < https://github.com/biocore/American-Gut/pull/150#issuecomment-104042759>>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104043086>.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104045555.

cuttlefishh commented 9 years ago

The challenge with using a reduced mapping file, I think, is the script assumes the sample IDs are the same (and in the same order) in the mapping file and PC file.

On May 20, 2015, at 2:32 PM, Luke Thompson lukethompson@gmail.com wrote:

If we set it to 10 slices, it takes ~1.5 seconds per plot on my laptop. Compare that to ~3 seconds per plot for 50 slices. It still looks OK but not quite as well intercalated, and there's more plot-to-plot variation. That's a 2x gain, not 10x. Is that enough to be worth it?

On May 20, 2015, at 2:24 PM, Luke Thompson <lukethompson@gmail.com mailto:lukethompson@gmail.com> wrote:

Let me test that quickly on my laptop (I'm assuming it will scale on pando/compy). We can reduce the number of slices even lower, but as we approach 1 we start to get points from one group (e.g. HMP fecal) covering up those from another group (e.g. AG fecal).

On May 20, 2015, at 2:21 PM, Daniel McDonald <notifications@github.com mailto:notifications@github.com> wrote:

If that decreases the runtime by 10x then it would be acceptable, what's your take there?

On Wed, May 20, 2015 at 3:20 PM, Luke Thompson <notifications@github.com mailto:notifications@github.com> wrote:

If speed of Figure 1 is the slowest part, you can speed that up by changing "50" in lines 60 and 63 to "20". That will slice each category into 20 slices instead of 50, so there are fewer sets to plot.

On May 20, 2015, at 2:15 PM, Daniel McDonald <notifications@github.com mailto:notifications@github.com> wrote:

No, still can't get the code to actually run yet...

On Wed, May 20, 2015 at 3:14 PM, Luke Thompson <notifications@github.com mailto:notifications@github.com

wrote:

Yeah, we realized that it wasn't necessary to generate reports for HMP and GG, but didn't institute any way to filter those out. Sounds like you got it working.

On May 20, 2015, at 2:05 PM, Daniel McDonald < notifications@github.com mailto:notifications@github.com> wrote:

nm, ill adjust it

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065 https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042 https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289 https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289>.

— Reply to this email directly or view it on GitHub <https://github.com/biocore/American-Gut/pull/150#issuecomment-104042759 https://github.com/biocore/American-Gut/pull/150#issuecomment-104042759>.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104043086.

cuttlefishh commented 9 years ago

Great. In that case, I would prefer to keep it at 50 slices because it looks much better. Or compromise and set it to 25.

On May 20, 2015, at 2:37 PM, Daniel McDonald notifications@github.com wrote:

Drops it to ~8 hours by present estimates. Figured out the juggling necessary to specify samples though, issuing a PR momentarily

On Wed, May 20, 2015 at 3:32 PM, Luke Thompson notifications@github.com wrote:

If we set it to 10 slices, it takes ~1.5 seconds per plot on my laptop. Compare that to ~3 seconds per plot for 50 slices. It still looks OK but not quite as well intercalated, and there's more plot-to-plot variation. That's a 2x gain, not 10x. Is that enough to be worth it?

On May 20, 2015, at 2:24 PM, Luke Thompson lukethompson@gmail.com wrote:

Let me test that quickly on my laptop (I'm assuming it will scale on pando/compy). We can reduce the number of slices even lower, but as we approach 1 we start to get points from one group (e.g. HMP fecal) covering up those from another group (e.g. AG fecal).

On May 20, 2015, at 2:21 PM, Daniel McDonald <notifications@github.com mailto:notifications@github.com> wrote:

If that decreases the runtime by 10x then it would be acceptable, what's your take there?

On Wed, May 20, 2015 at 3:20 PM, Luke Thompson < notifications@github.com mailto:notifications@github.com> wrote:

If speed of Figure 1 is the slowest part, you can speed that up by changing "50" in lines 60 and 63 to "20". That will slice each category into 20 slices instead of 50, so there are fewer sets to plot.

On May 20, 2015, at 2:15 PM, Daniel McDonald < notifications@github.com mailto:notifications@github.com> wrote:

No, still can't get the code to actually run yet...

On Wed, May 20, 2015 at 3:14 PM, Luke Thompson < notifications@github.com mailto:notifications@github.com

wrote:

Yeah, we realized that it wasn't necessary to generate reports for HMP and GG, but didn't institute any way to filter those out. Sounds like you got it working.

On May 20, 2015, at 2:05 PM, Daniel McDonald < notifications@github.com mailto:notifications@github.com> wrote:

nm, ill adjust it

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065 < https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065>>.

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042 < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042>>.

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289 < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289>>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104042759 < https://github.com/biocore/American-Gut/pull/150#issuecomment-104042759>>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104043086>.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104045555.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104046883.

rob-knight commented 9 years ago

If we’re talking about a 1 day difference, it’s worth it to make them nicer as this is already so much later than we wanted…

On May 20, 2015, at 2:39 PM, Luke Thompson notifications@github.com wrote:

Great. In that case, I would prefer to keep it at 50 slices because it looks much better. Or compromise and set it to 25.

On May 20, 2015, at 2:37 PM, Daniel McDonald notifications@github.com wrote:

Drops it to ~8 hours by present estimates. Figured out the juggling necessary to specify samples though, issuing a PR momentarily

On Wed, May 20, 2015 at 3:32 PM, Luke Thompson notifications@github.com wrote:

If we set it to 10 slices, it takes ~1.5 seconds per plot on my laptop. Compare that to ~3 seconds per plot for 50 slices. It still looks OK but not quite as well intercalated, and there's more plot-to-plot variation. That's a 2x gain, not 10x. Is that enough to be worth it?

On May 20, 2015, at 2:24 PM, Luke Thompson lukethompson@gmail.com wrote:

Let me test that quickly on my laptop (I'm assuming it will scale on pando/compy). We can reduce the number of slices even lower, but as we approach 1 we start to get points from one group (e.g. HMP fecal) covering up those from another group (e.g. AG fecal).

On May 20, 2015, at 2:21 PM, Daniel McDonald <notifications@github.com mailto:notifications@github.com> wrote:

If that decreases the runtime by 10x then it would be acceptable, what's your take there?

On Wed, May 20, 2015 at 3:20 PM, Luke Thompson < notifications@github.com mailto:notifications@github.com> wrote:

If speed of Figure 1 is the slowest part, you can speed that up by changing "50" in lines 60 and 63 to "20". That will slice each category into 20 slices instead of 50, so there are fewer sets to plot.

On May 20, 2015, at 2:15 PM, Daniel McDonald < notifications@github.com mailto:notifications@github.com> wrote:

No, still can't get the code to actually run yet...

On Wed, May 20, 2015 at 3:14 PM, Luke Thompson < notifications@github.com mailto:notifications@github.com

wrote:

Yeah, we realized that it wasn't necessary to generate reports for HMP and GG, but didn't institute any way to filter those out. Sounds like you got it working.

On May 20, 2015, at 2:05 PM, Daniel McDonald < notifications@github.com mailto:notifications@github.com> wrote:

nm, ill adjust it

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065 < https://github.com/biocore/American-Gut/pull/150#issuecomment-104039065>>.

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042 < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041042>>.

— Reply to this email directly or view it on GitHub <

https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289 < https://github.com/biocore/American-Gut/pull/150#issuecomment-104041289>>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104042759 < https://github.com/biocore/American-Gut/pull/150#issuecomment-104042759>>.

— Reply to this email directly or view it on GitHub < https://github.com/biocore/American-Gut/pull/150#issuecomment-104043086>.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104045555.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104046883.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104047196.

wasade commented 9 years ago

Just issued #152 which allows this to be done in parallel. Walltime is now managable

cuttlefishh commented 9 years ago

Looks good. One nice thing about passing sample IDs as a parameter is we can quickly test known samples as a sanity check. Probably worth doing before we deploy for the whole AG.

On May 20, 2015, at 2:46 PM, Daniel McDonald notifications@github.com wrote:

Just issued #152 https://github.com/biocore/American-Gut/pull/152 which allows this to be done in parallel. Walltime is now managable

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/150#issuecomment-104048467.

wasade commented 9 years ago

That is a definite benefit as well

wasade commented 9 years ago

...hooking it up now though