How to use paired-end data?

bonsai-team / matam

Mapping-Assisted Targeted-Assembly for Metagenomics

GNU Affero General Public License v3.0

19 stars 9 forks source link

How to use paired-end data? #80

Closed YiweiNiu closed 4 years ago

loic-couderc commented 5 years ago

Hi @YiweiNiu, To analyze paired-end reads you can put together all your reads in the same file and give that file to MATAM. However, MATAM will treats this reads as single reads and will not fully benefits of paired-end reads. We planned to handle such reads in the future, but no due date is determined: https://github.com/bonsai-team/matam/issues/31

YiweiNiu commented 5 years ago

Thank you for your reply.

So, I could just interleave the forward and reverse fastqs to feed MATAM? Then have to write a script to retrieve paired-end data from the output?

ppericard commented 5 years ago

Hi @YiweiNiu,

You can either interleave your paired-end reads, or just concatenate the forward and reverse files in a single file. MATAM will just consider them as single reads for now. As for the output, MATAM will just output a collection of assembled sequences (scaffolds) and optionally their taxonomic assignment (using the RDP classifier and Krona) if you're working with rRNA. If you want to know which reads correspond to which assembled sequence, the best way to do that is to re-align the reads onto the scaffolds. This is already done by MATAM for abundance estimation with SortMeRNA, and you can retrieve the resulting SAM file if you use the '--keep_tmp' switch.

Cheers, Pierre

Nafson194 commented 3 years ago

Hi Pierre,

I have an issue with the output of MATAM. I run this script as describe here matam_assembly.py -d $DBDIR/SILVA_128_SSURef_NR95 -i 16sp.art_HS25_pe_100bp_50x.fq --cpu 4 --max_memory 10000 -v --perform_taxonomic_assignment. After it was done, I got many files in the output directories. I am confused about the file to be used for further analysis as there many fast files with taxonomy.

I tried to run samples comparison as described here(matam_compare_samples.py -s samples_to_compare.tsv -t contingency_table.tsv -c comparaison_table.tsv ) but the required inputs were not present in the output directories for forward and reverse files.

Could you give me a helping hand?

Best Nafiu