MaestSi / MetONTIIME

A Meta-barcoding pipeline for analysing ONT data in QIIME2 framework
GNU General Public License v3.0
78 stars 17 forks source link

evaluate diversity sh #35

Closed asogg closed 3 years ago

asogg commented 3 years ago

Hi Simone , i started the script to evaluate diversity and i had some problems (sorry to bother you) see below ./Evaluate_diversity.sh -w home/as/MetONTIIME/fast5_pass_analysis/analysis -m home/as/MetONTIIME -d 400 -t 8 -c 1 realpath: home/as/MetONTIIME/fast5_pass_analysis/analysis: No such file or directory Working directory: realpath: home/as/MetONTIIME: No such file or directory Sample metadata: Sampling depth: 400 reads Number of threads: 8 Clustering threshold: 1 realpath: missing operand Try 'realpath --help' for more information. ./Evaluate_diversity.sh: line 82: /manifest_400_subsampled.txt: Permission denied Usage: qiime tools import [OPTIONS]

Import data to create a new QIIME 2 Artifact. See https://docs.qiime2.org/ for usage examples and details on the file types and associated semantic types that can be imported.

Options: --type TEXT The semantic type of the artifact that will be created upon importing. Use --show-importable-types to see what importable semantic types are available in the current deployment. [required] --input-path PATH Path to file or directory that should be imported. [required] --output-path ARTIFACT Path where output artifact should be written. [required] --input-format TEXT The format of the data to be imported. If not provided, data must be in the format expected by the semantic type provided via --type. --show-importable-types Show the semantic types that can be supplied to --type to import data into an artifact. --show-importable-formats Show formats that can be supplied to --input-format to import data into an artifact. --help Show this message and exit.

                There was a problem with the command:                     

(1/1) Invalid value for '--input-path': Path '/manifest_400_subsampled.txt' does not exist. Usage: qiime vsearch dereplicate-sequences [OPTIONS]

Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the clusterfeatures* methods in this plugin.

Inputs: --i-sequences ARTIFACT SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality] The sequences to be dereplicated. [required] Parameters: --p-derep-prefix / --p-no-derep-prefix Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant. [default: False] Outputs: --o-dereplicated-table ARTIFACT FeatureTable[Frequency] The table of dereplicated sequences. [required] --o-dereplicated-sequences ARTIFACT FeatureData[Sequence] The dereplicated sequences. [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --examples Show usage examples and exit. --citations Show citations and exit. --help Show this message and exit.

              There were some problems with the command:                  

(1/3) Invalid value for '--i-sequences': '/sequences_400_subsampled.qza' is not a valid filepath (2/3) Invalid value for '--o-dereplicated-table': '/' is not a writable directory, cannot write output to it. (3/3) Invalid value for '--o-dereplicated-sequences': '/' is not a writable directory, cannot write output to it. Usage: qiime vsearch cluster-features-de-novo [OPTIONS]

Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality- filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Inputs: --i-sequences ARTIFACT FeatureData[Sequence] The sequences corresponding to the features in table. [required] --i-table ARTIFACT FeatureTable[Frequency] The feature table to be clustered. [required] Parameters: --p-perc-identity PROPORTION Range(0, 1, inclusive_start=False, inclusive_end=True) The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter. [required] --p-threads INTEGER Range(0, 256, inclusive_end=True) The number of threads to use for computation. Passing 0 will launch one thread per CPU core. [default: 1] Outputs: --o-clustered-table ARTIFACT FeatureTable[Frequency] The table following clustering of features. [required] --o-clustered-sequences ARTIFACT FeatureData[Sequence] Sequences representing clustered features. [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --examples Show usage examples and exit. --citations Show citations and exit. --help Show this message and exit.

              There were some problems with the command:                  

(1/4) Invalid value for '--i-sequences': '/rep-seqs_tmp_400_subsampled.qza' is not a valid filepath (2/4) Invalid value for '--i-table': '/table_tmp_400_subsampled.qza' is not a valid filepath (3/4) Invalid value for '--o-clustered-table': '/' is not a writable directory, cannot write output to it. (4/4) Invalid value for '--o-clustered-sequences': '/' is not a writable directory, cannot write output to it. rm: cannot remove '/table_tmp_400_subsampled.qza': No such file or directory rm: cannot remove '/rep-seqs_tmp_400_subsampled.qza': No such file or directory Usage: qiime phylogeny align-to-tree-mafft-fasttree [OPTIONS]

This pipeline will start by creating a sequence alignment using MAFFT, after which any alignment columns that are phylogenetically uninformative or ambiguously aligned will be removed (masked). The resulting masked alignment will be used to infer a phylogenetic tree and then subsequently rooted at its midpoint. Output files from each step of the pipeline will be saved. This includes both the unmasked and masked MAFFT alignment from q2-alignment methods, and both the rooted and unrooted phylogenies from q2-phylogeny methods.

Inputs: --i-sequences ARTIFACT FeatureData[Sequence] The sequences to be used for creating a fasttree based rooted phylogenetic tree. [required] Parameters: --p-n-threads VALUE Int % Range(1, None) | Str % Choices('auto') The number of threads. (Use auto to automatically use all available cores) This value is used when aligning the sequences and creating the tree with fasttree. [default: 1] --p-mask-max-gap-frequency PROPORTION Range(0, 1, inclusive_end=True) The maximum relative frequency of gap characters in a column for the column to be retained. This relative frequency must be a number between 0.0 and 1.0 (inclusive), where 0.0 retains only those columns without gap characters, and 1.0 retains all columns regardless of gap character frequency. This value is used when masking the aligned sequences. [default: 1.0] --p-mask-min-conservation PROPORTION Range(0, 1, inclusive_end=True) The minimum relative frequency of at least one non-gap character in a column for that column to be retained. This relative frequency must be a number between 0.0 and 1.0 (inclusive). For example, if a value of 0.4 is provided, a column will only be retained if it contains at least one character that is present in at least 40% of the sequences. This value is used when masking the aligned sequences. [default: 0.4] --p-parttree / --p-no-parttree This flag is required if the number of sequences being aligned are larger than 1000000. Disabled by default. [default: False] Outputs: --o-alignment ARTIFACT FeatureData[AlignedSequence] The aligned sequences. [required] --o-masked-alignment ARTIFACT FeatureData[AlignedSequence] The masked alignment. [required] --o-tree ARTIFACT The unrooted phylogenetic tree. Phylogeny[Unrooted] [required] --o-rooted-tree ARTIFACT Phylogeny[Rooted] The rooted phylogenetic tree. [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --examples Show usage examples and exit. --citations Show citations and exit. --help Show this message and exit.

              There were some problems with the command:                  

(1/5) Invalid value for '--i-sequences': '/rep-seqs_400_subsampled.qza' is not a valid filepath (2/5) Invalid value for '--o-alignment': '/' is not a writable directory, cannot write output to it. (3/5) Invalid value for '--o-masked-alignment': '/' is not a writable directory, cannot write output to it. (4/5) Invalid value for '--o-tree': '/' is not a writable directory, cannot write output to it. (5/5) Invalid value for '--o-rooted-tree': '/' is not a writable directory, cannot write output to it. Usage: qiime diversity core-metrics-phylogenetic [OPTIONS]

Applies a collection of diversity metrics (both phylogenetic and non- phylogenetic) to a feature table.

Inputs: --i-table ARTIFACT FeatureTable[Frequency] The feature table containing the samples over which diversity metrics should be computed. [required] --i-phylogeny ARTIFACT Phylogenetic tree containing tip identifiers that Phylogeny[Rooted] correspond to the feature identifiers in the table. This tree can contain tip ids that are not present in the table, but all feature ids in the table must be present in this tree. [required] Parameters: --p-sampling-depth INTEGER Range(1, None) The total frequency that each sample should be rarefied to prior to computing diversity metrics. [required] --m-metadata-file METADATA... (multiple arguments The sample metadata to use in the emperor plots. will be merged) [required] --p-n-jobs-or-threads VALUE Int % Range(1, None) | Str % Choices('auto') [beta/beta-phylogenetic methods only] - The number of concurrent jobs or CPU threads to use in performing this calculation. Individual methods will create jobs/threads as implemented in q2-diversity-lib dependencies. May not exceed the number of available physical cores. If n-jobs-or-threads = 'auto', one thread/job will be created for each identified CPU core on the host. [default: 1] Outputs: --o-rarefied-table ARTIFACT FeatureTable[Frequency] The resulting rarefied feature table. [required] --o-faith-pd-vector ARTIFACT SampleData[AlphaDiversity] Vector of Faith PD values by sample. [required] --o-observed-features-vector ARTIFACT SampleData[AlphaDiversity] Vector of Observed Features values by sample. [required] --o-shannon-vector ARTIFACT SampleData[AlphaDiversity] Vector of Shannon diversity values by sample. [required] --o-evenness-vector ARTIFACT SampleData[AlphaDiversity] Vector of Pielou's evenness values by sample. [required] --o-unweighted-unifrac-distance-matrix ARTIFACT DistanceMatrix Matrix of unweighted UniFrac distances between pairs of samples. [required] --o-weighted-unifrac-distance-matrix ARTIFACT DistanceMatrix Matrix of weighted UniFrac distances between pairs of samples. [required] --o-jaccard-distance-matrix ARTIFACT DistanceMatrix Matrix of Jaccard distances between pairs of samples. [required] --o-bray-curtis-distance-matrix ARTIFACT DistanceMatrix Matrix of Bray-Curtis distances between pairs of samples. [required] --o-unweighted-unifrac-pcoa-results ARTIFACT PCoAResults PCoA matrix computed from unweighted UniFrac distances between samples. [required] --o-weighted-unifrac-pcoa-results ARTIFACT PCoAResults PCoA matrix computed from weighted UniFrac distances between samples. [required] --o-jaccard-pcoa-results ARTIFACT PCoAResults PCoA matrix computed from Jaccard distances between samples. [required] --o-bray-curtis-pcoa-results ARTIFACT PCoAResults PCoA matrix computed from Bray-Curtis distances between samples. [required] --o-unweighted-unifrac-emperor VISUALIZATION Emperor plot of the PCoA matrix computed from unweighted UniFrac. [required] --o-weighted-unifrac-emperor VISUALIZATION Emperor plot of the PCoA matrix computed from weighted UniFrac. [required] --o-jaccard-emperor VISUALIZATION Emperor plot of the PCoA matrix computed from Jaccard. [required] --o-bray-curtis-emperor VISUALIZATION Emperor plot of the PCoA matrix computed from Bray-Curtis. [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --examples Show usage examples and exit. --citations Show citations and exit. --help Show this message and exit.

              There were some problems with the command:                  

(1/21) Invalid value for '--i-phylogeny': '/rooted-tree_400_subsampled.qza' is not a valid filepath (2/21) Invalid value for '--i-table': '/table_400_subsampled.qza' is not a valid filepath (3/21) Missing option '--m-metadata-file'. (4/21) Invalid value for '--output-dir': '/core-metrics- results_400_subsampled/' is not a writable directory, cannot write output to it. (5/21) Missing option '--o-rarefied-table'. ("--output-dir" may also be used) (6/21) Missing option '--o-faith-pd-vector'. ("--output-dir" may also be used) (7/21) Missing option '--o-observed-features-vector'. ("--output-dir" may also be used) (8/21) Missing option '--o-shannon-vector'. ("--output-dir" may also be used) (9/21) Missing option '--o-evenness-vector'. ("--output-dir" may also be used) (10/21) Missing option '--o-unweighted-unifrac-distance-matrix'. ("--output- dir" may also be used) (11/21) Missing option '--o-weighted-unifrac-distance-matrix'. ("--output- dir" may also be used) (12/21) Missing option '--o-jaccard-distance-matrix'. ("--output-dir" may also be used) (13/21) Missing option '--o-bray-curtis-distance-matrix'. ("--output-dir" may also be used) (14/21) Missing option '--o-unweighted-unifrac-pcoa-results'. ("--output- dir" may also be used) (15/21) Missing option '--o-weighted-unifrac-pcoa-results'. ("--output-dir" may also be used) (16/21) Missing option '--o-jaccard-pcoa-results'. ("--output-dir" may also be used) (17/21) Missing option '--o-bray-curtis-pcoa-results'. ("--output-dir" may also be used) (18/21) Missing option '--o-unweighted-unifrac-emperor'. ("--output-dir" may also be used) (19/21) Missing option '--o-weighted-unifrac-emperor'. ("--output-dir" may also be used) (20/21) Missing option '--o-jaccard-emperor'. ("--output-dir" may also be used) (21/21) Missing option '--o-bray-curtis-emperor'. ("--output-dir" may also be used) Usage: qiime diversity alpha-rarefaction [OPTIONS]

Generate interactive alpha rarefaction curves by computing rarefactions between min_depth and max_depth. The number of intermediate depths to compute is controlled by the steps parameter, with n iterations being computed at each rarefaction depth. If sample metadata is provided, samples may be grouped based on distinct values within a metadata column.

Inputs: --i-table ARTIFACT FeatureTable[Frequency] Feature table to compute rarefaction curves from. [required] --i-phylogeny ARTIFACT Optional phylogeny for phylogenetic metrics. Phylogeny[Rooted] [optional] Parameters: --p-max-depth INTEGER The maximum rarefaction depth. Must be greater than Range(1, None) min-depth. [required] --p-metrics TEXT... Choices('observed_features', 'shannon', 'robbins', 'goods_coverage', 'margalef', 'dominance', 'menhinick', 'berger_parker_d', 'chao1', 'pielou_e', 'singles', 'brillouin_d', 'lladser_pe', 'heip_e', 'simpson_e', 'fisher_alpha', 'enspie', 'michaelis_menten_fit', 'ace', 'mcintosh_e', 'gini_index', 'doubles', 'faith_pd', 'simpson', 'mcintosh_d') The metrics to be measured. By default computes observed_features, shannon, and if phylogeny is provided, faith_pd. [optional] --m-metadata-file METADATA... (multiple arguments The sample metadata. will be merged) [optional] --p-min-depth INTEGER The minimum rarefaction depth. Range(1, None) [default: 1] --p-steps INTEGER The number of rarefaction depths to include between Range(2, None) min-depth and max-depth. [default: 10] --p-iterations INTEGER The number of rarefied feature tables to compute at Range(1, None) each step. [default: 10] Outputs: --o-visualization VISUALIZATION [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --examples Show usage examples and exit. --citations Show citations and exit. --help Show this message and exit.

              There were some problems with the command:                  

(1/3) Invalid value for '--i-table': '/table_400_subsampled.qza' is not a valid filepath (2/3) Invalid value for '--i-phylogeny': '/rooted-tree_400_subsampled.qza' is not a valid filepath (3/3) Invalid value for '--o-visualization': '/' is not a writable directory, cannot write output to it.

your help will be precious thanks Alessio

MaestSi commented 3 years ago

Hi, in your command: ./Evaluate_diversity.sh -w home/as/MetONTIIME/fast5_pass_analysis/analysis -m home/as/MetONTIIME -d 400 -t 8 -c 1 I see two issues: 1- You forgot '/' before home/as/MetONTIIME/fast5_pass_analysis/analysis as an argument to -w parameter 2- You did not specify a sample-metadata.tsv file as an argument to -m parameter. Try with -m /home/as/MetONTIIME/sample-metadata.tsv instead. As a minor point, I see you used -c 1: this is telling the software to cluster only identical sequences but, due to base-calling errors, also sequences from the same species would end up in different clusters. Try using lower values, or (better in my opinion) using Evaluate_diversity_non_phylogenetic.sh script, which allows you to exploit the previously obtained taxonomic classification (e.g. at genus level), without clustering sequences and building a phylogenetic tree of representative sequences from each cluster. Simone

asogg commented 3 years ago

Hello, after your suggestions i tried the other script see below as@as-HP-Pavilion-dv6-Notebook-PC:~/MetONTIIME$ ./Evaluate_diversity_non_phylogenetic.sh -f /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/table_collapsed_absfreq_level6.qza -m /home/as/MetONTIIME/sample-metadata.tsv -d 400 Collapsed feature table: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/table_collapsed_absfreq_level6.qza Sample metadata: /home/as/MetONTIIME/sample-metadata.tsv Sampling depth: 400 reads Plugin error from diversity:

Ordinations with less than two dimensions are not supported.

Debug info has been saved to /tmp/qiime2-q2cli-err-nf8i8jxk.log
find: ‘/home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_400_subsampled_non_phylogenetic’: No such file or directory Saved Visualization to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/alpha-rarefaction_table_collapsed_absfreq_level6_400_subsampled_non_phylogenetic.qzv

MaestSi commented 3 years ago

Hi, I think the error Ordinations with less than two dimensions are not supported. is due to the fact that requiring at least 400 reads for each sample (-d 400), only sample BC01, for which you seem to have 440 reads based on the previous issue, survives the filtering, and the tool can't perform beta-diversity analysis with just one single sample. Test it with 100 reads if you want to include also sample BC04 (103 reads) or use a bigger dataset. Simone

asogg commented 3 years ago

I supposed that after seeing the barcodes , i change the depth to 100. Thanks for your support Alessio

asogg commented 3 years ago

100 was perfect!

s@as-HP-Pavilion-dv6-Notebook-PC:~/MetONTIIME$ ./Evaluate_diversity_non_phylogenetic.sh -f /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/table_collapsed_absfreq_level6.qza -m /home/as/MetONTIIME/sample-metadata.tsv -d 100 Collapsed feature table: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/table_collapsed_absfreq_level6.qza Sample metadata: /home/as/MetONTIIME/sample-metadata.tsv Sampling depth: 100 reads Saved FeatureTable[Frequency] to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/rarefied_table.qza Saved SampleData[AlphaDiversity] to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/observed_features_vector.qza Saved SampleData[AlphaDiversity] to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/shannon_vector.qza Saved SampleData[AlphaDiversity] to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/evenness_vector.qza Saved DistanceMatrix to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/jaccard_distance_matrix.qza Saved DistanceMatrix to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/bray_curtis_distance_matrix.qza Saved PCoAResults to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/jaccard_pcoa_results.qza Saved PCoAResults to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/bray_curtis_pcoa_results.qza Saved Visualization to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/jaccard_emperor.qzv Saved Visualization to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/bray_curtis_emperor.qzv Saved Visualization to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/observed_features_vector.qzv Saved Visualization to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/shannon_vector.qzv Saved Visualization to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/jaccard_pcoa_results.qzv Saved Visualization to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/rarefied_table.qzv Saved Visualization to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/bray_curtis_pcoa_results.qzv Saved Visualization to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/core-metrics-results_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic/evenness_vector.qzv Saved Visualization to: /home/as/MetONTIIME/fast5_pass_analysis/analysis/collapsed_feature_tables/alpha-rarefaction_table_collapsed_absfreq_level6_100_subsampled_non_phylogenetic.qzv as@as-HP-Pavilion-dv6-Notebook-PC:~/MetONTIIME$

this pipeline it is wonderful , congrats. grazie mille Alessio

MaestSi commented 3 years ago

Thanks and ciao! :)