biocore / songbird

Vanilla regression methods for microbiome differential abundance analysis
BSD 3-Clause "New" or "Revised" License
58 stars 25 forks source link

IndexError: Issue importing data into songbird? #132

Closed bck243 closed 4 years ago

bck243 commented 4 years ago

Hello, I am trying to run songbird on four metatranscriptomic samples from a phytoplankton bloom to see the effect of nitrogen status on gene expression. I've tried to reformat my input data to match the .biom count table and metadata files from the red sea example dataset, but I am getting an error that leads me to believe it is not being read in properly. I have attached my count data as a biom file (converted using biom convert -i MB_exp1_counts.tsv -o MB_exp1_counts.biom --table-type "OTU table" --to-hdf5) and .tsv, and I've attached my simple metadata .tsv outline which samples are nitrogen replete and deplete.

I am running: songbird multinomial \ --input-biom data/MB_exp1_counts.biom \ --metadata-file data/MB_exp1_metadata.tsv \ --formula "nitrogen_condition"

The error is: Traceback (most recent call last): File "/usr/local/projdata/0568/projects/PLANKTON/illumina_aallen/bkolody/installations/Miniconda3/envs/songbird_env/bin/songbird", line 225, in songbird() File "/usr/local/projdata/0568/projects/PLANKTON/illumina_aallen/bkolody/installations/Miniconda3/envs/songbird_env/lib/python3.7/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/usr/local/projdata/0568/projects/PLANKTON/illumina_aallen/bkolody/installations/Miniconda3/envs/songbird_env/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/usr/local/projdata/0568/projects/PLANKTON/illumina_aallen/bkolody/installations/Miniconda3/envs/songbird_env/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/projdata/0568/projects/PLANKTON/illumina_aallen/bkolody/installations/Miniconda3/envs/songbird_env/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/projdata/0568/projects/PLANKTON/illumina_aallen/bkolody/installations/Miniconda3/envs/songbird_env/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(args, **kwargs) File "/usr/local/projdata/0568/projects/PLANKTON/illumina_aallen/bkolody/installations/Miniconda3/envs/songbird_env/bin/songbird", line 180, in multinomial seed=random_seed, File "/usr/local/projdata/0568/projects/PLANKTON/illumina_aallen/bkolody/installations/Miniconda3/envs/songbird_env/lib/python3.7/site-packages/songbird/util.py", line 186, in split_training i = np.argsort(idx)[num_random_test_examples] IndexError: index 5 is out of bounds for axis 0 with size 3

I have tried converting the red sea .biom table to .tsv and back, and it the tutorial still works without issue, so I'm not sure what my problem is.

Thanks so much, Bethany Archive.zip

mortonjt commented 4 years ago

Hi @bck243 , thanks for reporting.

I'm looking at your biom table, and it looks like there are only 4 samples. Is this correct?

In [1]: import biom                                                                                                                        
In [2]: from biom import load_table                                                                                                        
In [3]: table = load_table('MB_exp1_counts.biom')                                                                                          
In [4]: table                                                                                                                              
Out[4]: 393094 x 4 <class 'biom.table.Table'> with 1452508 nonzero entries (92% dense)
In [5]: table.ids()                                                                                                                        
Out[5]: array(['      MB1', 'MB2', 'MB3', 'MB4'], dtype=object)

If this is the case, then songbird is expected to crash - 10 samples are required at minimum. We'll probably want to include a better error message in the future.

bck243 commented 4 years ago

Thanks for the quick reply! That makes sense, then. Cheers, Bethany