biocore / songbird

Vanilla regression methods for microbiome differential abundance analysis
BSD 3-Clause "New" or "Revised" License
54 stars 25 forks source link

songbird multinomial error: negative dimensions are not allowed #169

Closed sherlyn99 closed 8 months ago

sherlyn99 commented 1 year ago

Context:

I have previously run songbird on the raw count data of a study successfully. My collaborator wants to see what the results look like if we use RPKM instead of raw counts. The abstract of the publication states that songbird works with both raw count and normalized count so I decided to give it a try. However, when I try to run songbird using the RPKM normalized count table, I keep hitting the following error:

Command used:

qiime songbird multinomial \ --i-table taxonomy.qza \ --m-metadata-file md.txt \ --p-formula "Label+Subject" \ --p-epochs 1000 \ --p-differential-prior 0.5 \ --p-training-column Testing \ --p-summary-interval 1 \ --o-differentials differentials.qza \ --o-regression-stats regression-stats.qza \ --o-regression-biplot regression-biplot.qza

Error:

Plugin error from songbird:

negative dimensions are not allowed

Error Log:

Traceback (most recent call last): File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/q2cli/commands.py", line 329, in call results = action(arguments) File "", line 2, in multinomial File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable output_types, provenance) File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in _callableexecutor output_views = self._callable(view_args) File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/songbird/q2/_method.py", line 43, in multinomial formula, min_sample_count, min_feature_count File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/songbird/util.py", line 178, in match_and_filter design = dmatrix(formula, metadata, return_type='dataframe') File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/patsy/highlevel.py", line 291, in dmatrix NA_action, return_type) File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/patsy/highlevel.py", line 165, in _do_highlevel_design NA_action) File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/patsy/highlevel.py", line 70, in _try_incr_builders NA_action) File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/patsy/build.py", line 721, in design_matrix_builders cat_levels_contrasts) File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/patsy/build.py", line 628, in _make_subterm_infos default=Treatment) File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/patsy/contrasts.py", line 602, in code_contrast_matrix return contrast.code_without_intercept(levels) File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/patsy/contrasts.py", line 183, in code_without_intercept eye = np.eye(len(levels) - 1) File "/home/y1weng/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/numpy/lib/twodim_base.py", line 201, in eye m = zeros((N, M), dtype=dtype, order=order) ValueError: negative dimensions are not allowed

So far I have tried to 1) scale all numerical values by 1000 and force them into integers - still getting the same error. 2) generate a null model by changing formula to "1" - did not work. Songbird Aborted without any error messages. 3) google around. Looks like no one has posted anything about this error in the specific context of using songbird.

Please kindly see attached inputs if needed. Github does not support qza so I am uploading the corresponding tsv. The qza is obtained by converting the tsv to biom and then biom to qza using qiime tools import. md.txt ft.txt

mortonjt commented 1 year ago

See https://github.com/biocore/songbird/issues/137

This error happens when all of the samples are being filtered out for some reason, either because sample ids don't match, or because the read count is too small (you can set --p-min-sample-count 1 to ignore this).

sherlyn99 commented 1 year ago

I apologize for missing the previous post on this and thanks so much for the prompt response!

I tried --p-min-sample-count 1. Songbird aborted without error messages. the The RPKM normalized count I have is mostly around ~10e-2 so that could be the reason. I did try artificially scale up my counts by 1000 though and the error persists.

How do I find out if all of my samples have been filtered out?

mortonjt commented 1 year ago

yea, the RPKM ~10e-2 is probably the issue. Songbird was designed for sequencing read counts, so small values is a failure mode. But the fix is easily (rescale with a large number). The error hints that everything has been filtered out (we should probably have a more informative error).

On Thu, Jun 29, 2023 at 6:12 AM sherlyn99 @.***> wrote:

I apologize for missing the previous post on this and thanks so much for the prompt response!

I tried --p-min-sample-count 1. Songbird aborted without error messages. the The RPKM normalized count I have is mostly around ~10e-2 so that could be the reason. I did try artificially scale up my counts by 1000 though and the error persists.

How do I find out if all of my samples have been filtered out?

— Reply to this email directly, view it on GitHub https://github.com/biocore/songbird/issues/169#issuecomment-1612779567, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXIPZ5G4544574DT353XNVIH5ANCNFSM6AAAAAAZYGYWEY . You are receiving this because you commented.Message ID: @.***>

sherlyn99 commented 1 year ago

Thanks for getting back to me! I suspect so too. I tried rescaling and songbird multinomial aborted without error messages.

The sample_ids in metadata was created by coping over the column names of ft.tsv so they should be identical.

Any ideas of other possibilities of causing this extreme filtering(?)

sherlyn99 commented 8 months ago

The issue is quotation marks around column names just in case people were facing a similar issue. Thanks so much for the help!