r_import_data.py on custom data set

ahorvath commented 4 years ago

I'd like to run the whole test.sh script on my own data set. Until r_import_data.py everything seems ok but there I got the following error message:

r_import_data.py --biomfile results/06-affiliation.biom --samplefile MB_sample_table.tsv --treefile $out_dir/13-tree-mafft.nwk --rdata $out_dir/14-phylo_import.Rdata --html $out_dir/14-phylo_import.html --log-file $out_dir/14-phylo_import.log

In the original version, this is the biom file: $frogs_dir/test/data/chaillou.biom

Can you help me with that? Many thanks, Attila input.zip

Traceback (most recent call last): File "/home/admin/Programs/frogs/FROGS-3.1.0/app/r_import_data.py", line 163, in Rscript(biomfile, samplefile, treefile, html, str(args.normalization).upper(), data, ranks, rmd_stderr).submit(args.log_file) File "/home/admin/Programs/frogs/FROGS-3.1.0/lib/frogsUtils.py", line 141, in submit subprocess.check_output( self.get_cmd(), shell=True ) File "/home/admin/anaconda3/envs/python2.7/lib/python2.7/subprocess.py", line 223, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command 'Rscript -e "rmarkdown::render('/home/admin/Programs/frogs/FROGS-3.1.0/app/r_import_data.Rmd',output_file='/data/Projects/Metagenome/results/14-phylo_import.html', params=list(biomfile='/data/Projects/Metagenome/results/1574494427.73_89065_06-affiliation.stdBiom', samplefile='/data/Projects/Metagenome/MB_sample_table.tsv', treefile='/data/Projects/Metagenome/results/13-tree-mafft.nwk', normalization=FALSE, outputRdata='/data/Projects/Metagenome/results/14-phylo_import.Rdata', ranks='Kingdom Phylum Class Order Family Genus Species', libdir ='/home/admin/Programs/frogs/FROGS-3.1.0/lib/external-lib'), intermediates_dir='/data/Projects/Metagenome/results')" 2> /data/Projects/Metagenome/results/1574494427.73_89065_rmarkdown.stderr' returned non-zero exit status 1

mariabernard commented 4 years ago

Hello,

The aim of test.sh is not to analyse dataset but simply to check that every dependencies are installed. Of course you can look at it to see command line example.

Error from R tools are not well tracked. What you can do, is to launch again the r_import command line with the --debug option to keep temporary file, and then launch the RScript command only without redirecting error (something like the following but by changing the input temporary biomfile).

Rscript -e "rmarkdown::render('/home/admin/Programs/frogs/FROGS-3.1.0/app/r_import_data.Rmd',output_file='/data/Projects/Metagenome/results/14-phylo_import.html', params=list(biomfile='/data/Projects/Metagenome/results/1574494427.73_89065_06-affiliation.stdBiom', samplefile='/data/Projects/Metagenome/MB_sample_table.tsv', treefile='/data/Projects/Metagenome/results/13-tree-mafft.nwk', normalization=FALSE, outputRdata='/data/Projects/Metagenome/results/14-phylo_import.Rdata', ranks='Kingdom Phylum Class Order Family Genus Species', libdir ='/home/admin/Programs/frogs/FROGS-3.1.0/lib/external-lib'), intermediates_dir='/data/Projects/Metagenome/results')"

/data/Projects/Metagenome/MB_sample_table.tsv mus contain in the first column sample names, and those sample names must be exactly the same as in your biomfile.

let me know if you need more help.

Maria

ahorvath commented 4 years ago

Hi Maria,

Many thanks for your help. I figured it out. I look for 16s human gut samples so I substituted the database with silva db and omitted rdp flag. affiliation_OTU.py --reference Silva_db/silva_132_16S/silva_132_16S.fasta --input-fasta $out_dir/04-filters.fasta --input-biom $out_dir/04-filters.biom --output-biom $out_dir/06-affiliation.biom --summary $out_dir/06-affiliation.html --log-file $out_dir/06-affiliation.log --nb-cpus $nb_cpu --java-mem $java_mem

Now the whole pipeline works well including the manova step which is great. There are two steps I skipped, though:

affiliation_postprocess.py and itsx.py - they seem to be fungi related steps and I don't know what to put there for 16S bacteria samples.

Again, the tools is great. I'd like to validate my results with an rdp classifier. Are you aware of any pre-existing SSU Bacteria rdp db that I could use with FROGS?

Many thanks, Attila

mariabernard commented 4 years ago

Glad to read that you solved your problem.

You can use Silva with the rdp classifier using the --rdp option. We distribute some databanks formated for RDP Classifier and Blast : https://github.com/geraldinepascal/FROGS#download-databanks

Concerning affiliation_postprocess.py and itsx.py, Yes itsx.py is only for fungi more precisely for ITS1 or ITS2 amplicon sequences. affilition_postprocess was designed for ITS but can be also used for other genes. The default behavior is to aggregate OTU that share the same "blast_taxonomy" with at least X% identity and Y% coverage. For ITS you may also provide and reference sequence fasta file. If one OTU is affiliated to 2 references sequence (with the same alignment score), it will select the smallest reference. The aime is to reduce ambiguities of affiliation for small ITS sequence perfectly included in other and longer ITS.

Regards

Maria

ahorvath commented 4 years ago

Many thanks for the prompt and professional answers. I'll check them out. Bests, Attila

mariabernard commented 4 years ago

you are welcome.

I close this issue, feel free to reopen it or a new one.

geraldinepascal / FROGS

r_import_data.py on custom data set #43