biocore / American-Gut

American Gut open-access data and IPython notebooks
Other
108 stars 81 forks source link

Updating shared_otus notebook... #136

Closed josenavas closed 9 years ago

josenavas commented 9 years ago

to point to the corerct filepaths and to use the latest biom

Fixes #135

I've just updated the notebook, is there anything else that I've to update? Also, are there tests for the functions on the ipython notebook?

wasade commented 9 years ago

Thanks, @josenavas. The TYPES_OF_PLANTS plot is wrong, it looks like there is an issue with the metadata for a single sample somewhere. The quick fix for the notebook is to ignore sample sets in which the number of samples represented is 1.

No tests, I've found it difficult to test inline methods in notebooks. Not opposed. Output looks correct with the exception of the TYPES_OF_PLANTS plot

wasade commented 9 years ago

That field and surrounding metadata don't look obviously wrong. Issue is with sample 000001078.1075961. I suspect this was a manual update? Weird...

josenavas commented 9 years ago

I just put a check in place to drop sample sets in which the number of samples represented is 1. I'm not sure what is going on with the metadata, but I hope that a space character is not the issue on that mapping file....

wasade commented 9 years ago

it isn't. metadata look sane, I think this was manual edit but I'm not sure how. I can't recall what is possible via the old admin interface

On Fri, Mar 20, 2015 at 4:17 PM, josenavas notifications@github.com wrote:

I just put a check in place to drop sample sets in which the number of samples represented is 1. I'm not sure what is going on with the metadata, but I hope that a space character is not the issue on that mapping file....

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/136#issuecomment-84168109.

josenavas commented 9 years ago

Also, as a note, I hat to include another check for ignoring empty metadata values (empty strings) as there was a few of those in all plots. So besides no_data and NA it also ignores empty strings. I do not know if that is a bug in another package or this is expected...

jwdebelius commented 9 years ago

I usually use pass na_values=['NA', 'no_data', 'unknown', ''] to pd.read_csv.

wasade commented 9 years ago

This is using the old MetadataMap that didn't take off.

It isn't a bug, just that they need to be handled appropriately.

I think this is good to go unless there are any other issues?

josenavas commented 9 years ago

Not that I'm aware of!