biocore / songbird

Vanilla regression methods for microbiome differential abundance analysis
BSD 3-Clause "New" or "Revised" License
54 stars 25 forks source link

ValueError: negative dimensions are not allowed #137

Closed mortonjt closed 1 year ago

mortonjt commented 3 years ago

Below is an example of how this error can arise

songbird multinomial --formula "tshirt+host_subject_id" --input-biom table.biom --metadata-file metadata.txt --summary-dir /results

Traceback (most recent call last):
  File "/home/melin2012/miniconda3/envs/regression/bin/songbird", line 111, in <module>
    songbird()
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/melin2012/miniconda3/envs/regression/bin/songbird", line 76, in multinomial
    min_sample_count, min_feature_count)
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/songbird/util.py", line 174, in match_and_filter
    design = dmatrix(formula, metadata, return_type='dataframe')
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/patsy/highlevel.py", line 291, in dmatrix
    NA_action, return_type)
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/patsy/highlevel.py", line 165, in _do_highlevel_design
    NA_action)
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/patsy/highlevel.py", line 70, in _try_incr_builders
    NA_action)
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/patsy/build.py", line 721, in design_matrix_builders
    cat_levels_contrasts)
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/patsy/build.py", line 628, in _make_subterm_infos
    default=Treatment)
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/patsy/contrasts.py", line 602, in code_contrast_matrix
    return contrast.code_without_intercept(levels)
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/patsy/contrasts.py", line 183, in code_without_intercept
    eye = np.eye(len(levels) - 1)
  File "/home/melin2012/miniconda3/envs/regression/lib/python3.6/site-packages/numpy/lib/twodim_base.py", line 186, in eye
    m = zeros((N, M), dtype=dtype, order=order)
ValueError: negative dimensions are not allowed

One issue with this run was all of the samples were accidentally being filtered out, since the sample names between the biom and the sample metadata didn't match.

We should have a better error message for this sort of error.

johannesbjork commented 3 years ago

Are they being filtered out by songbird?

mortonjt commented 3 years ago

Yes, there are a couple of filters in songbird, in particular

  1. --min-feature-count : filters microbes according to the number of samples that they are observed in. By default, if a microbe appears in less than 10 samples, then it will be filtered out (since regression estimates will be poor for those microbes).
  2. --min-sample-count : filters samples according to the number of reads that they have. By default, if a sample has less than 1000 reads, it will be removed
  3. Samples names that don't agree between sample metadata and biom will be filtered out.

See below for more details https://github.com/biocore/songbird#73-faqs-parameters-

johannesbjork commented 3 years ago

--min-sample-count was indeed the problem. This raises another question: does songbird work on the relative abundance table produced by MetaPhlAn? Or would you need to convert it to read counts by multiplying each taxa by the total no. of reads per sample? Thanks!

mortonjt commented 3 years ago

Getting the raw read counts would be ideal. But scaling the relative abundances to say 10k should also be ok

On Fri, Aug 28, 2020, 5:28 AM johannesbjork notifications@github.com wrote:

--min-sample-count was indeed the problem. This raises another question: does songbird work on the relative abundance table produced by MetaPhlAn? Or would you need to convert it to read counts by multiplying sample relative abundances by sample specific sequencing depths? Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore/songbird/issues/137#issuecomment-682473917, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXMKZ52U7BJV6HYO3Q3SC6IFLANCNFSM4QKXV2AA .