biocore / calour

exploratory and interactive microbiome analyses based on heatmaps
BSD 3-Clause "New" or "Revised" License
27 stars 22 forks source link

allow io.read() to read bioms and metadata files in memory #89

Open tanaes opened 6 years ago

tanaes commented 6 years ago

It would be nice to be able to operate on bioms and metadata files that are in memory, rather than just reading from a file

mortonjt commented 6 years ago

Big +1 on this

On Mar 20, 2018 9:13 PM, "Jon Sanders" notifications@github.com wrote:

It would be nice to be able to operate on bioms and metadata files that are in memory, rather than just reading from a file

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biocore/calour/issues/89, or mute the thread https://github.com/notifications/unsubscribe-auth/AD_a3ZaEdwNZgRoiDLkpf_twLS0QqB1Fks5tgdOBgaJpZM4Sy_YO .

amnona commented 6 years ago

I think you can create a calour Experiment from a data matrix and sample metadata pandas dataframe. just do: xx = calour.Experiment(data=data_matrix, sample_metadata=sample_metadata_dataframe) or better if you are dealing with an amplicon experiment (this adds the default database / some taxonomy related functions): xx = calour.AmpliconExperiment(data=data_matrix, sample_metadata=sample_metadata_dataframe)

(NOTE: the data matrix should contain samples as rows (either a numpy 2d array or a scipy sparse matrix), with samples in the same order as the dataframe). you can also add feature metadata (i.e. taxonomy etc.) with a second dataframe and supply it as: feature_metadata=feature_metadata_dataframe Let me know if it works or not / if you meant something else. Guess we should also add an example for this in the docs.

Thanks :)

tanaes commented 6 years ago

Doesn't seem to work:

>>> exp2 =  ca.read_amplicon(data,
                         sample_metadata_file=samples,
                         feature_metadata_file=features,
                         normalize=10000,
                         min_reads=1000)

~/miniconda3/envs/qiime2-2017.12/lib/python3.5/genericpath.py in getsize(filename)
     48 def getsize(filename):
     49     """Return the size of a file, reported by os.stat()."""
---> 50     return os.stat(filename).st_size
     51 
     52 

TypeError: argument should be string, bytes or integer, not DataFrame
amnona commented 6 years ago

if you want to create an experiment from existing (in memory) data, you should not use ca.read_amplicon() but rather: xx = calour.AmpliconExperiment(data=data_matrix, sample_metadata=sample_metadata_dataframe)

for example: (first load an existing experiment to have data/sample metadata) dat=ca.read_amplicon('./all.biom','./map.txt',normalize=10000,min_reads=1000) (create a new experiment from the data) xx=ca.AmpliconExperiment(data=dat.data, sample_metadata=dat.sample_metadata) print(xx) AmpliconExperiment with 171 samples, 2013 features

Is this what you need?

tanaes commented 6 years ago

ahh lemme try

On Wed, Mar 21, 2018 at 11:05 AM amnona notifications@github.com wrote:

if you want to create an experiment from existing (in memory) data, you should not use ca.read_amplicon() but rather: xx = calour.AmpliconExperiment(data=data_matrix, sample_metadata=sample_metadata_dataframe)

for example: (first load an existing experiment to have data/sample metadata)

dat=ca.read_amplicon('./all.biom','./map.txt',normalize=10000,min_reads=1000) (create a new experiment from the data) xx=ca.AmpliconExperiment(data=dat.data, sample_metadata=dat.sample_metadata) print(xx) AmpliconExperiment with 171 samples, 2013 features

Is this what you need?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore/calour/issues/89#issuecomment-375041779, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6JABlUnHRH2E6WpNxRMh6c4tvPLIHkks5tgpZ_gaJpZM4Sy_YO .

tanaes commented 6 years ago

Experiencing an error when I do it this way and then try to filter samples out by sum of features:

>>> exp = ca.AmpliconExperiment(data=data.as_matrix(), sample_metadata=samples, feature_metadata=features)

>>> exp.filter_by_data('sum_abundance', cutoff=1000, axis='s')

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/indexing.py in _getbool_axis(self, key, axis)
   1390         try:
-> 1391             return self.obj._take(inds, axis=axis, convert=False)
   1392         except Exception as detail:

~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/generic.py in _take(self, indices, axis, convert, is_copy)
   2149                                    axis=self._get_block_manager_axis(axis),
-> 2150                                    verify=True)
   2151         result = self._constructor(new_data).__finalize__(self)

~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/internals.py in take(self, indexer, axis, verify, convert)
   4254         if convert:
-> 4255             indexer = maybe_convert_indices(indexer, n)
   4256 

~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/indexing.py in maybe_convert_indices(indices, n)
   2102     if mask.any():
-> 2103         raise IndexError("indices are out-of-bounds")
   2104     return indices

IndexError: indices are out-of-bounds

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-133-e2fd6d9873ee> in <module>()
      1 exp = ca.AmpliconExperiment(data=data.as_matrix(), sample_metadata=samples, feature_metadata=features)
      2 
----> 3 exp.filter_by_data('sum_abundance', cutoff=1000, axis='s')

~/git_sw/calour/calour/util.py in inner(*args, **kwargs)
    132             raise ValueError('unknown axis `%r`' % v)
    133 
--> 134         return func(*ba.args, **ba.kwargs)
    135     return inner
    136 

~/git_sw/calour/calour/experiment.py in inner(*args, **kwargs)
    226             try:
    227                 logger.debug('Run func {}'.format(fn))
--> 228                 new_exp = func(*args, **kwargs)
    229                 if exp._log is True:
    230                     param = ['%r' % i for i in args[1:]] + ['%s=%r' % (k, v) for k, v in kwargs.items()]

~/git_sw/calour/calour/filtering.py in filter_by_data(exp, predicate, axis, negate, inplace, **kwargs)
    252 
    253     logger.info('After filtering, %s remaining' % np.sum(select))
--> 254     return exp.reorder(select, axis=axis, inplace=inplace)
    255 
    256 

~/git_sw/calour/calour/util.py in inner(*args, **kwargs)
    132             raise ValueError('unknown axis `%r`' % v)
    133 
--> 134         return func(*ba.args, **ba.kwargs)
    135     return inner
    136 

~/git_sw/calour/calour/experiment.py in reorder(self, new_order, axis, inplace)
    326         if axis == 0:
    327             exp.data = exp.data[new_order, :]
--> 328             exp.sample_metadata = exp.sample_metadata.iloc[new_order, :]
    329         else:
    330             exp.data = exp.data[:, new_order]

~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1365             except (KeyError, IndexError):
   1366                 pass
-> 1367             return self._getitem_tuple(key)
   1368         else:
   1369             # we by definition only have the 0th axis

~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1751                 continue
   1752 
-> 1753             retval = getattr(retval, self.name)._getitem_axis(key, axis=axis)
   1754 
   1755             # if the dim was reduced, then pass a lower-dim the next time

~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1813         if is_bool_indexer(key):
   1814             self._has_valid_type(key, axis)
-> 1815             return self._getbool_axis(key, axis=axis)
   1816 
   1817         # a list of integers

~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/indexing.py in _getbool_axis(self, key, axis)
   1391             return self.obj._take(inds, axis=axis, convert=False)
   1392         except Exception as detail:
-> 1393             raise self._exception(detail)
   1394 
   1395     def _get_slice_axis(self, slice_obj, axis=None):

IndexError: indices are out-of-bounds

Ring any bells?

amnona commented 6 years ago

arghh. sorry about these pesky bugs. can you send me the (pickled) data, samples and features variables so i can recreate and hopefully solve it?