biocore / emp

Code repository of the Earth Microbiome Project.
http://www.earthmicrobiome.org
BSD 3-Clause "New" or "Revised" License
156 stars 68 forks source link

visualization of the environmental parameters gradients #13

Closed gregcaporaso closed 7 years ago

gregcaporaso commented 12 years ago

From Jack:

I would also like to have a visual representation of the environmental gradients we have for each ecosystem. i.e. I can imagine a figure like the attached (sorry in my hotel room) - where we represent from a gradient of 0-100 the coverage of the gradients we have already surveyed. 0 would be the lowest possible (sensible) limit for that variable and 100 the highest. So for temp we would go for -56C to +120C, and for pH from 1 to 14 - or something like that. I could have some one start creating this if everyone agrees it is a good idea.

dansmith01 commented 12 years ago

How's something like this? The different colors represent different environments. Red = Animal-Associated Green = Sea Water Blue = Fresh Water Orange = Soil Black = other

Env. Gradients V1

gregcaporaso commented 12 years ago

This is cool Dan.

Would it be possible to add something like a mouseover to see what projects/samples contribute to what bars in the bar chart?

Also, it's a bit of a problem that some of the bars are obscuring other bars. Would having the different environments side-by-side work better?

dansmith01 commented 12 years ago

Mouseovers would be cool - I'd need to rework my script quite a bit though to track projects/samples.

The bars are stacked, so there's no need to worry about visual obstruction :)

dansmith01 commented 12 years ago

I should also mention that although the combined height of each bar is plotted on a log scale, the color-by-color breakdown is simple percentage.

gregcaporaso commented 12 years ago

OK, that makes sense, thanks! What do others think? Is mouseover to show project/sample ids in Dan's plot worth the development effort on this part?

Also, Dan, could you modify so the figure titles are not abbreviated (e.g., 'tot org carb' becomes 'Total Organic Carbon')?

dansmith01 commented 12 years ago

I've updated the figure to display full title descriptions as you suggested - you may need to click on the above image and hit refresh to see the changes. Do you happen to know what the units of measurement are for the values in EMP data, e.g. um vs nm?

I've also put together this graphic showing the geographical distribution of EMP samples: Geographic Distribution

gregcaporaso commented 12 years ago

Cool, this looks better. If @gilbertjack agrees that this is what he's looking for I think we can close the issue.

As for the geographic distribution, this overlaps with issue #2 so @dansmith01 and @douginator2000 should connect about this.

gilbertjack commented 12 years ago

I am very happy with this.

gilbertjack commented 12 years ago

Hey Dan,

Can you remove the gradient bars between the major divisions - i.e. just keep the gradient bars for 10, 100, 1000, 10,000, etc.

Cheers

Jack

dansmith01 commented 12 years ago

Sure thing! I've updated the figure. You can also view it at this link: http://img.dnasmith.com/histograms_static1.png

I left the tick marks though to help key in the viewer that it's on a log scale. If you'd like them removed as well, just let me know.

dansmith01 commented 12 years ago

I'm thinking it'd be cool to add a fourth color for human-associated samples. What do you think?

gilbertjack commented 12 years ago

sure sounds good, and yes i am fine with the marks

dansmith01 commented 12 years ago

Ok, updated. And some of them now have a log scale on the x-axis.

gregcaporaso commented 12 years ago

@dansmith01, this looks much better! Thanks!

Two last issues:

Once this is done I think we're ready to commit these files to the data repository and close this issue.

dansmith01 commented 12 years ago

Here's a PDF of the histograms: http://img.dnasmith.com/histograms.pdf I'll put together a legend shortly.

gregcaporaso commented 12 years ago

@dansmith01 - what was the source of the data in this plot? I'm realizing that we still don't have the full mapping file together, so I'm just wondering if this is comprehensive.

gregcaporaso commented 12 years ago

@dansmith01 - just wanted to check on the source of this data. We'll need the plot generated from the latest mapping file (see issue #24) which we're hoping will be ready tonight. Sorry, I hope that's not too much extra effort!

dansmith01 commented 12 years ago

@gregcaporaso - I'm using the metadata files downloaded from the EMP GESD.

gregcaporaso commented 12 years ago

OK, there is going to be a new "official" metadata file coming soon (issue

24). Will you be able to easily regenerate the plot with that one? It will

be the same format as the ones you're downloading.

Greg

On Fri, Aug 10, 2012 at 1:06 PM, dansmith01 notifications@github.comwrote:

@gregcaporaso https://github.com/gregcaporaso - I'm using the metadata files downloaded from the EMP GESD.

— Reply to this email directly or view it on GitHubhttps://github.com/EarthMicrobiomeProject/isme14/issues/13#issuecomment-7655210.

dansmith01 commented 12 years ago

Yep - I think that'll be simple enough.

dansmith01 commented 12 years ago

I've got the legend for this figure ready to go: http://img.dnasmith.com/histograms-legend.pdf

Legend

gregcaporaso commented 12 years ago

Perfect, thanks!

dansmith01 commented 12 years ago

@gregcaporaso - The official metadata file has 6,541 samples, whereas the one I compiled from GESD has 14,176 samples. For example, the GESD dataset named "sample_template_2012-06-14 13_54_16.486850" is missing. Do you know why so many samples were excluded from the official compilation, and would you still like me to regenerate the above histograms using the reduced dataset?

jistombaugh commented 12 years ago

The official metadata file contains only the samples that were sequenced and then subsequently processed and loaded into the QIIME-DB. The EMP portal contains all samples including those samples which haven't been sequenced yet, which is why there is a large discrepancy in the numbers.

gregcaporaso commented 12 years ago

For this analysis I think we want to go with what has been sequenced already as that's what we're including for the other analyses. @gilbertjack and @rob-knight, do you agree?

gilbertjack commented 12 years ago

Yes

rob-knight commented 12 years ago

yes, definitely

On Aug 13, 2012, at 6:05 PM, gilbertjack notifications@github.com<mailto:notifications@github.com> wrote:

Yes

— Reply to this email directly or view it on GitHubhttps://github.com/EarthMicrobiomeProject/isme14/issues/13#issuecomment-7712456.

gregcaporaso commented 12 years ago

@dansmith01, let me know if you need anything else to get this done.

dansmith01 commented 12 years ago

@gregcaporaso, could you take a quick look at the master mapping file (issue #24)? The last 104 lines don't seems to mesh with the columns in the lines above.

gregcaporaso commented 12 years ago

@dansmith01 are you sure your dropbox in sync'ing? That was an issue with an older version, but I fixed that a couple of days ago. The version here:

https://github.com/EarthMicrobiomeProject/isme14/blob/master/master_mapping_file.txt.gz?raw=true

also has the fix.

dansmith01 commented 12 years ago

Not sure about dropbox, but that link works great! Thanks

jairideout commented 12 years ago

It might be an issue with dropbox- Greg and I ran into the same issue where my shared dropbox folder wasn't updating a couple of days ago.

On Wed, Aug 15, 2012 at 12:46 PM, dansmith01 notifications@github.comwrote:

Not sure about dropbox, but that link works great! Thanks

— Reply to this email directly or view it on GitHubhttps://github.com/EarthMicrobiomeProject/isme14/issues/13#issuecomment-7766996.

dansmith01 commented 12 years ago

Here are the PowerPoint Slides: https://www.dropbox.com/s/891ggfz00b1ddgh/Coverage%20of%20Environmental%20Parameters.pptx

alexdthomas commented 11 years ago

Hello all,

I made a mock-up demonstration along a similar line as this thread for my first meeting with Dr. Jansson, you can see it here. https://www.dropbox.com/s/wd68fpdlakw83c2/AThomas_EMP_DemoAnalysis.pptx

Jack Gilbert liked the maps and wanted to use them. However, I am unsure as to the quality of the data I used. I through this together by copying the Lat/Long coordinates out of the map here http://www.microbio.me/emp/

I'd like to make myself useful, so let me know what you think. I have some questions and concerns about the metadata I added to https://github.com/EarthMicrobiomeProject/isme14/issues/17#issuecomment-17695847

Thanks, Alex

cuttlefishh commented 8 years ago

@rob-knight suggested redoing graphs of environmental parameters for EMP 20k analysis.