biocore / songbird

Vanilla regression methods for microbiome differential abundance analysis
BSD 3-Clause "New" or "Revised" License
54 stars 25 forks source link

patsy.PatsyError: Error evaluating factor: NameError: name 'X' is not defined #138

Closed johannesbjork closed 3 years ago

johannesbjork commented 3 years ago

I'm getting this error patsy.PatsyError: Error evaluating factor: NameError: name 'X' is not defined for nominal categorical variables. The error seems reproducible using the readea dataset adding a nominal categorical variable called faky in R as follows:

readsea <- read.table("/songbird-master/data/redsea/redsea_metadata.txt", sep="\t", header=T)
readsea$faky <- sample(x=LETTERS[1:3], size=nrow(readsea), replace=T)
write.table(readsea, "/songbird-master/data/redsea/redsea_metadata2.txt", quote=F, row.names=F, sep="\t")

I manually added # in front of sampleid

Running songbird on command-line:

(songbird_env) user@user redsea % songbird multinomial \                                  
        --input-biom redsea.biom \
        --metadata-file redsea_metadata2.txt \
        --formula "Temperature + C(faky, Diff, levels=["A","B","C"])" \
        --epochs 1 \    
        --differential-prior 0.5 \
        --training-column Testing \
        --summary-interval 1 \
        --summary-dir results
Traceback (most recent call last):
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/patsy/compat.py", line 36, in call_and_wrap_exc
    return f(*args, **kwargs)
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/patsy/eval.py", line 166, in eval
    + self._namespaces))
  File "<string>", line 1, in <module>
NameError: name 'A' is not defined

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/user/python/miniconda3/envs/songbird_env/bin/songbird", line 113, in <module>
    songbird()
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/user/python/miniconda3/envs/songbird_env/bin/songbird", line 76, in multinomial
    min_sample_count, min_feature_count)
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/songbird/util.py", line 173, in match_and_filter
    design = dmatrix(formula, metadata, return_type='dataframe')
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/patsy/highlevel.py", line 291, in dmatrix
    NA_action, return_type)
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/patsy/highlevel.py", line 165, in _do_highlevel_design
    NA_action)
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/patsy/highlevel.py", line 70, in _try_incr_builders
    NA_action)
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/patsy/build.py", line 696, in design_matrix_builders
    NA_action)
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/patsy/build.py", line 443, in _examine_factor_types
    value = factor.eval(factor_states[factor], data)
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/patsy/eval.py", line 566, in eval
    data)
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/patsy/eval.py", line 551, in _eval
    inner_namespace=inner_namespace)
  File "/Users/user/python/miniconda3/envs/songbird_env/lib/python3.6/site-packages/patsy/compat.py", line 43, in call_and_wrap_exc
    exec("raise new_exc from e")
  File "<string>", line 1, in <module>
patsy.PatsyError: Error evaluating factor: NameError: name 'A' is not defined
    Temperature + C(faky, Diff, levels=[A,B,C])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Maybe this indicates that I'm doing something obviously wrong?

mortonjt commented 3 years ago

There were a couple of errors that were here. One is the error that you reported in this issue, the other error is the negative dimensions that inspired this issue : https://github.com/biocore/songbird/issues/137

Regarding issue https://github.com/biocore/songbird/issues/137, have you made sure that your dimensions are correct? I know when converting between biom and R, it is easy to transpose the tables. Since the feature names won't match with your sample names, this will also cause all of the samples to be filtered out.

Regarding this error, this looks like a patsy parsing issue. You may need to add quotes around your levels. See this qiime2 forum post for an example : https://forum.qiime2.org/t/levels-in-songbird/16197

johannesbjork commented 3 years ago

Thanks Jamie (@mortonjt). Indeed, changing double for single quotes around the levels solved the issue.

(songbird_env) user@user redsea % songbird multinomial \
        --input-biom redsea.biom \
        --metadata-file redsea_metadata2.txt \
        --formula "Temperature + C(faky, Diff, levels=['A','B','C'])" \
        --epochs 1 \
        --differential-prior 0.5 \
        --training-column Testing \
        --summary-interval 1 \
        --summary-dir results
mortonjt commented 3 years ago

awesome!!

On Wed, Aug 26, 2020 at 10:26 AM johannesbjork notifications@github.com wrote:

Thanks Jamie (@mortonjt https://github.com/mortonjt). Indeed, changing double for single quotes around the levels solved the issue.

(songbird_env) user@user redsea % songbird multinomial \ --input-biom redsea.biom \ --metadata-file redsea_metadata2.txt \ --formula "Temperature + C(faky, Diff, levels=['A','B','C'])" \ --epochs 1 \ --differential-prior 0.5 \ --training-column Testing \ --summary-interval 1 \ --summary-dir results

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biocore/songbird/issues/138#issuecomment-680985640, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXICSWXUEPEGOYE4KU3SCUZSXANCNFSM4QLQNPDQ .