Closed bck243 closed 3 years ago
interesting -- do you have any categorical variables that snuck into these columns?
Sometimes if you leave values blank, or if you have a value such as none
, it will default to a categorical variable.
If you want, you can post your metadata file to help with debugging.
Thank you! Here is my metadata file. There is one "NA" in CTDPRS.
ok, if you see the following samples, they have a ton of NAs. I'd try to drop all samples that don't have continuous value measurements and see if you can get something reasonable
E_18A_16S_C_AGAGTCAC E_18A_16S_G_TAGCGAGT E_18B_16S_G_CTGCGTGT E_18B_16S_C_TACGAGAC
On Mon, Feb 8, 2021 at 12:03 PM bck243 notifications@github.com wrote:
Thank you! Here is my metadata file. There is one "NA" in CTDPRS.
iTAG_metadata_16S_all_for_q2_8_2020.txt https://github.com/biocore/songbird/files/5946119/iTAG_metadata_16S_all_for_q2_8_2020.txt
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore/songbird/issues/151#issuecomment-775371496, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXJ4EECTEYYWHGDL4KLS6AYR3ANCNFSM4XJM5ZTQ .
Ok, thanks!
More generally, is there a way to manually specify that a variable is continuous, or will I need to remove all NA's from metadata for the other variables that are causing me trouble? For example, I have more "NAs" in "consensus_age" than "CTDPRS".
I fixed the metadata to not have NA's for CTDPRS for those samples and am re-running qiime songbird multinomial. I'm expecting it to take a couple days, because it did last time.
I also tried subsetting my data to see if it works more quickly, but I come up against another error when I try that:
source activate qiime2-2020.6
qiime feature-table filter-samples \
--i-table P18_16S_all_runs_raw_no_contam_or_control2.qza \
--m-metadata-file example_samples.txt \
--o-filtered-table example_samples.qza
qiime songbird multinomial \
--i-table ./data/example_samples.qza \
--m-metadata-file ./data/iTAG_metadata_16S_all_for_q2_2_2021.tsv \
--p-formula "CTDPRS" \
--p-epochs 10000 \
--p-differential-prior 0.5 \
--p-summary-interval 1 \
--o-differentials CTDPRS_example_samples.qza \
--o-regression-stats CTDPRS_example_samples_regression-stats.qza \
--o-regression-biplot CTDPRS_example_samples_regression-biplot.qza
Error:
Plugin error from songbird:
initial_value must have a shape specified: Tensor("random_normal:0", shape=(6, ?), dtype=float32)
Debug info has been saved to /usr/local/scratch/path/tmp/qiime2-q2cli-err-gepubej9.log
qiime tools export \
--input-path example_samples.qza \
--output-path example_sample_subset
biom convert -i feature-table.biom -o feature-table.tsv --to-tsv
Subsetted data looks fine:
> head -n 3 feature-table.tsv
# Constructed from biom file
#OTU ID A_101S_16S_G_ACTATCTG B_10S_16S_G_GACACCGT C_100S_16S_G_ACTATCTG C_105S_16S_G_CTGCGTGT C_1S_16S_G_CTGCGTGT D_103S_16S_G_ACTATCTG D_104S_16S_G_CTGCGTGT
a0381498f3581ed0249c8a1cd28b6e3b 0.0 0.0 0.0 21.0 0.0 0.0 0.0
right, you'll need to drop those variables in order for it to be continuous (since NA will now be treated as a categorical variable).
we could probably handle it as missing data at some point, but that'll require a bit of thought on the underlying model.
On Mon, Feb 8, 2021 at 4:25 PM bck243 notifications@github.com wrote:
Ok, thanks!
More generally, is there a way to manually specify that a variable is continuous, or will I need to remove all NA's from metadata for the other variables that are causing me trouble? For example, I have more "NAs" in "consensus_age" than "CTDPRS".
I fixed the metadata to not have NA's for CTDPRS for those samples and am re-running qiime songbird multinomial. I'm expecting it to take a couple days, because it did last time.
I also tried subsetting my data to see if it works more quickly, but I come up against another error when I try that: Get small subset:
source activate qiime2-2020.6 qiime feature-table filter-samples \ --i-table P18_16S_all_runs_raw_no_contam_or_control2.qza \ --m-metadata-file example_samples.txt \ --o-filtered-table example_samples.qza
run test
qiime songbird multinomial \ --i-table ./data/example_samples.qza \ --m-metadata-file ./data/iTAG_metadata_16S_all_for_q2_2_2021.tsv \ --p-formula "CTDPRS" \ --p-epochs 10000 \ --p-differential-prior 0.5 \ --p-summary-interval 1 \ --o-differentials CTDPRS_example_samples.qza \ --o-regression-stats CTDPRS_example_samples_regression-stats.qza \ --o-regression-biplot CTDPRS_example_samples_regression-biplot.qza
Error:
Plugin error from songbird:
initial_value must have a shape specified: Tensor("random_normal:0", shape=(6, ?), dtype=float32)
Debug info has been saved to /usr/local/scratch/METAGENOMICS/bkolody/tmp/qiime2-q2cli-err-gepubej9.log
Check that subsetting didn't go wrong:
qiime tools export \ --input-path example_samples.qza \ --output-path example_sample_subset biom convert -i feature-table.biom -o feature-table.tsv --to-tsv
Subsetted data looks fine:
head -n 3 feature-table.tsv
Constructed from biom file
OTU ID A_101S_16S_G_ACTATCTG B_10S_16S_G_GACACCGT C_100S_16S_G_ACTATCTG C_105S_16S_G_CTGCGTGT C_1S_16S_G_CTGCGTGT D_103S_16S_G_ACTATCTG D_104S_16S_G_CTGCGTGT
a0381498f3581ed0249c8a1cd28b6e3b 0.0 0.0 0.0 21.0 0.0 0.0 0.0
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore/songbird/issues/151#issuecomment-775531402, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXOX4OWLM3ZVH5HFJQ3S6BXGDANCNFSM4XJM5ZTQ .
Working after removing all NA's, thanks!
Hi @bck243, like @mortonjt mentioned, the NAs are the problem here. You don't need to drop the entire sample from your metadata file, though, simply remove the NA value from the cell. Please see here for more details on the QIIME 2 metadata spec:
Hello!
I was wondering how to explicitly specify that my input variables are numeric (continuous) rather than categorical?
In the examples, songbird multinomial defaults to variables being numeric (e.g. with "--formula "Depth+Temperature+Salinity+Oxygen+Fluorescence+Nitrate" ), but when I run it on my data, it is defaulting to categorical and creating a column for each possible value of the variable.
For example, when I run qiime songbird multinomial on pressure :
I get this resulting table:
This is also happening when I have formulas with multiple numeric variables, e.g. "CTDPRS+CTDTMP+LATITUDE+consensus_age_interpolated"
Thank you!