Closed Jigyasa3 closed 5 years ago
First, I do not recommend using DEICODE on taxonomically collapsed data, I suggest running it at lowest level possible. In your case (i.e. shotgun data) that would be species, MAGs, or functional calls. The number of features should be on the order of hundreds to tens of thousands.
The minimum sum cut offs are meant to prevent under characterized features and samples from being included. We have not benchmarked the best cut offs for shotgun data (it also depends if it is shallow or deep sequencing). If you are unsure, then I would use no cut off (i.e. zero), otherwise best judgement on what would separate well vs. under characterized features/samples.
I hope this helps.
Hey @cameronmartino
Thank you so much for replying! I followed your advice- I am using the lowest taxonomic level and not using any cut-off for min-sample-count and min-feature-count.
But still the final distance matrix file generates an output for 106 samples, not for 141 samples that were provided.
my original file format with bacterial taxonomy (similar to QIIME biom table) per row bacterial_taxa sample1 sample2 sample3 .. sample141 bacteria1_p;bacteria1_c;bacteria1_o;bacteria1_g;bacteria1_s bacteria2_p;bacteria2_c;bacteria2_o;bacteria2_g;bacteria2_s .. .. 200 rows
my code- $deicode --in-biom cellulases_genus_141samples.txt_json.biom --output-dir decodier_transformation_cellulases --min-sample-count 0 --min-feature-count 0 --max_iterations 5 --n_components 2
Hey @Jigyasa3
That should only happen if those samples have zero counts when summed across all features. Could you double-check that this is not the case? Thanks!
Thanks, @cameronmartino there was a problem with my data frame.
Glad you got it! Closing this issue - please reopen it if you have more questions.
Hey!
I am running the following parameters on my shot-gun sequenced bacterial taxa biom file. The original file is in the format- sample1 sample2 sample3 .. .. bacteria_phyla1 bacteria_phyla2 bacterial_phyla3 .. ..
The code I am running- $deicode --in-biom traitsandsamples.biom --output-dir decodier_transformation --min-sample-count 10 --min-feature-count 1 --max_iterations 5 --n_components 2
Is there a recommendation for --min-sample count? and --min-feature count? I have a total number of samples =141, total number of features (i.e. bacterial_phyla)=5