Open tanaes opened 6 years ago
Big +1 on this
On Mar 20, 2018 9:13 PM, "Jon Sanders" notifications@github.com wrote:
It would be nice to be able to operate on bioms and metadata files that are in memory, rather than just reading from a file
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biocore/calour/issues/89, or mute the thread https://github.com/notifications/unsubscribe-auth/AD_a3ZaEdwNZgRoiDLkpf_twLS0QqB1Fks5tgdOBgaJpZM4Sy_YO .
I think you can create a calour Experiment from a data matrix and sample metadata pandas dataframe. just do: xx = calour.Experiment(data=data_matrix, sample_metadata=sample_metadata_dataframe) or better if you are dealing with an amplicon experiment (this adds the default database / some taxonomy related functions): xx = calour.AmpliconExperiment(data=data_matrix, sample_metadata=sample_metadata_dataframe)
(NOTE: the data matrix should contain samples as rows (either a numpy 2d array or a scipy sparse matrix), with samples in the same order as the dataframe). you can also add feature metadata (i.e. taxonomy etc.) with a second dataframe and supply it as: feature_metadata=feature_metadata_dataframe Let me know if it works or not / if you meant something else. Guess we should also add an example for this in the docs.
Thanks :)
Doesn't seem to work:
>>> exp2 = ca.read_amplicon(data,
sample_metadata_file=samples,
feature_metadata_file=features,
normalize=10000,
min_reads=1000)
~/miniconda3/envs/qiime2-2017.12/lib/python3.5/genericpath.py in getsize(filename)
48 def getsize(filename):
49 """Return the size of a file, reported by os.stat()."""
---> 50 return os.stat(filename).st_size
51
52
TypeError: argument should be string, bytes or integer, not DataFrame
if you want to create an experiment from existing (in memory) data, you should not use ca.read_amplicon() but rather: xx = calour.AmpliconExperiment(data=data_matrix, sample_metadata=sample_metadata_dataframe)
for example: (first load an existing experiment to have data/sample metadata) dat=ca.read_amplicon('./all.biom','./map.txt',normalize=10000,min_reads=1000) (create a new experiment from the data) xx=ca.AmpliconExperiment(data=dat.data, sample_metadata=dat.sample_metadata) print(xx) AmpliconExperiment with 171 samples, 2013 features
Is this what you need?
ahh lemme try
On Wed, Mar 21, 2018 at 11:05 AM amnona notifications@github.com wrote:
if you want to create an experiment from existing (in memory) data, you should not use ca.read_amplicon() but rather: xx = calour.AmpliconExperiment(data=data_matrix, sample_metadata=sample_metadata_dataframe)
for example: (first load an existing experiment to have data/sample metadata)
dat=ca.read_amplicon('./all.biom','./map.txt',normalize=10000,min_reads=1000) (create a new experiment from the data) xx=ca.AmpliconExperiment(data=dat.data, sample_metadata=dat.sample_metadata) print(xx) AmpliconExperiment with 171 samples, 2013 features
Is this what you need?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore/calour/issues/89#issuecomment-375041779, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6JABlUnHRH2E6WpNxRMh6c4tvPLIHkks5tgpZ_gaJpZM4Sy_YO .
Experiencing an error when I do it this way and then try to filter samples out by sum of features:
>>> exp = ca.AmpliconExperiment(data=data.as_matrix(), sample_metadata=samples, feature_metadata=features)
>>> exp.filter_by_data('sum_abundance', cutoff=1000, axis='s')
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/indexing.py in _getbool_axis(self, key, axis)
1390 try:
-> 1391 return self.obj._take(inds, axis=axis, convert=False)
1392 except Exception as detail:
~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/generic.py in _take(self, indices, axis, convert, is_copy)
2149 axis=self._get_block_manager_axis(axis),
-> 2150 verify=True)
2151 result = self._constructor(new_data).__finalize__(self)
~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/internals.py in take(self, indexer, axis, verify, convert)
4254 if convert:
-> 4255 indexer = maybe_convert_indices(indexer, n)
4256
~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/indexing.py in maybe_convert_indices(indices, n)
2102 if mask.any():
-> 2103 raise IndexError("indices are out-of-bounds")
2104 return indices
IndexError: indices are out-of-bounds
During handling of the above exception, another exception occurred:
IndexError Traceback (most recent call last)
<ipython-input-133-e2fd6d9873ee> in <module>()
1 exp = ca.AmpliconExperiment(data=data.as_matrix(), sample_metadata=samples, feature_metadata=features)
2
----> 3 exp.filter_by_data('sum_abundance', cutoff=1000, axis='s')
~/git_sw/calour/calour/util.py in inner(*args, **kwargs)
132 raise ValueError('unknown axis `%r`' % v)
133
--> 134 return func(*ba.args, **ba.kwargs)
135 return inner
136
~/git_sw/calour/calour/experiment.py in inner(*args, **kwargs)
226 try:
227 logger.debug('Run func {}'.format(fn))
--> 228 new_exp = func(*args, **kwargs)
229 if exp._log is True:
230 param = ['%r' % i for i in args[1:]] + ['%s=%r' % (k, v) for k, v in kwargs.items()]
~/git_sw/calour/calour/filtering.py in filter_by_data(exp, predicate, axis, negate, inplace, **kwargs)
252
253 logger.info('After filtering, %s remaining' % np.sum(select))
--> 254 return exp.reorder(select, axis=axis, inplace=inplace)
255
256
~/git_sw/calour/calour/util.py in inner(*args, **kwargs)
132 raise ValueError('unknown axis `%r`' % v)
133
--> 134 return func(*ba.args, **ba.kwargs)
135 return inner
136
~/git_sw/calour/calour/experiment.py in reorder(self, new_order, axis, inplace)
326 if axis == 0:
327 exp.data = exp.data[new_order, :]
--> 328 exp.sample_metadata = exp.sample_metadata.iloc[new_order, :]
329 else:
330 exp.data = exp.data[:, new_order]
~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/indexing.py in __getitem__(self, key)
1365 except (KeyError, IndexError):
1366 pass
-> 1367 return self._getitem_tuple(key)
1368 else:
1369 # we by definition only have the 0th axis
~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
1751 continue
1752
-> 1753 retval = getattr(retval, self.name)._getitem_axis(key, axis=axis)
1754
1755 # if the dim was reduced, then pass a lower-dim the next time
~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1813 if is_bool_indexer(key):
1814 self._has_valid_type(key, axis)
-> 1815 return self._getbool_axis(key, axis=axis)
1816
1817 # a list of integers
~/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/pandas/core/indexing.py in _getbool_axis(self, key, axis)
1391 return self.obj._take(inds, axis=axis, convert=False)
1392 except Exception as detail:
-> 1393 raise self._exception(detail)
1394
1395 def _get_slice_axis(self, slice_obj, axis=None):
IndexError: indices are out-of-bounds
Ring any bells?
arghh. sorry about these pesky bugs. can you send me the (pickled) data, samples and features variables so i can recreate and hopefully solve it?
It would be nice to be able to operate on bioms and metadata files that are in memory, rather than just reading from a file