arumugamlab / msamtools

Metagenomics-related extended functionalities to samtools
MIT License
19 stars 2 forks source link

For Profiling filtering and profiling purposes, we should use bam or sorted bam ? #11

Open arpit20328 opened 2 months ago

arpit20328 commented 2 months ago

So I generated sample.bam file

I also generated its sorted_sample.bam file as well

I then use msamtools

                                                                  FILTERING

msamtools filter -b -l 80 -p 95 -z 80 --besthit bacterial_alignment_normal.bam > sample.bam msamtools filter -b -l 80 -p 95 -z 80 --besthit bacterial_alignment_sorted.bam > sorted_sample.bam

                                                                  PROFILING

msamtools profile --multi=proportional --label=sample1 --unit=rel -o sample1.IGC.profile_normal.txt.gz bacterial_alignment_normal.bam

msamtools profile --multi=proportional --label=sample1 --unit=rel -o sample1.IGC.profile_sorted.txt.gz bacterial_alignment_sorted.bam

I am seeing different results in both sample1.IGC.profile_normal.txt.gz and in sample1.IGC.profile_normal.txt.gz

Please let me know what is the correct way ? using normal generated bam file or sorted bam file ?

microbiomix commented 2 months ago

Hello,

Thank you for reaching out.

You should NOT sort the bam files by coordinates. This will lead to incorrect results.

Since you are filtering using --besthit option, the input should be sorted by read name (QNAME) and not coordinates. Please see last paragraph of Section 4 in the documentation.

For profiling, again it is expected that bam file is sorted by QNAME. Please see the warning in the 2nd paragraph of Section 5 in the documentation.

Finally, we recommend you to use the filtered bam from msamtools filter as input to msamtools profile, not the unfiltered bam file as you have shown. So I would suggest:

msamtools filter -b -l 80 -p 95 -z 80 --besthit bacterial_alignment_normal.bam > sample.bam
msamtools profile --multi=proportional --label=sample1 --unit=rel -o sample1.IGC.profile_normal.txt.gz sample.bam

or even simpler with piping:

msamtools filter -b -l 80 -p 95 -z 80 --besthit bacterial_alignment_normal.bam \
    | msamtools profile --multi=proportional --label=sample1 --unit=rel -o sample1.IGC.profile_normal.txt.gz -

Hope this helps. Please let us know if you have any other questions.

arpit20328 commented 2 months ago

@microbiomix I will get back to you once I apply it. thanks for initiating a discussion