CAMI-challenge / CAMISIM

CAMISIM: Simulating metagenomes and microbial communities
https://data.cami-challenge.org/participate
Apache License 2.0
167 stars 37 forks source link

Picrust for CAMISIM dataset #123

Closed MonicaSteffi closed 2 years ago

MonicaSteffi commented 2 years ago

Dear All, Sorry for the simplest question. I am trying to perform PICRUST2 analysis using CAMI dataset. I downloaded fastq files, abundance tables and sequence of genome in fasta file from CAMI website. I tried executing picrust2 by giving genome.fa as input but got error. I noticed that authors of https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0633-6#Sec16 performed picrust2 analysis.

Kindly let me know how to rectify this issue

AlphaSquad commented 2 years ago

From just this information it is very hard to tell what went wrong. You did not use CAMISIM anywhere in your analysis, right?

I would assume that it is more of a problem with picrust than with the fasta-file

MonicaSteffi commented 2 years ago

Dear @AlphaSquad ,

Thank you for your reply. Yes. I haven't use CAMISIM yet. But downloaded fastq format of the simulation human dataset from https://data.cami-challenge.org/participate.

I need a simulation dataset which I can use to create a functional profiling using both shotgun and 16S pipeline.

I followed shotgun pipeline for those fastq files (kneaddata and humann3) to get functional profiling. How can I reconstruct 16S gene sequences from Metagenome contigs which can be used for PICRUST2 analysis

AlphaSquad commented 2 years ago

I see, I think I understand now. The functional profile of the metagenome was not created using the simulated data, but the "real" data instead, in this case the BIOM profile we used to simulate the metagenome from. But - this probably is not what you want to do. Instead, since the RefSeq IDs and reference genomes are provided alongside the data set, I would probably use these to look for 16S sequences (two possibilities):

  1. If you use the genome_to_id.tsv file, you will find the full RefSeq name of every genome in the last column, you can then check the annotation of this genome for 16S rRNA genome sequences.
  2. You can alternatively directly use the fasta files and a 16S prediction software, e.g. barrnap
AlphaSquad commented 2 years ago

Did this answer you questions?

AlphaSquad commented 2 years ago

Closing this for now, please feel free to reopen if anything is still unclear