deeptools / deepTools

Tools to process and analyze deep sequencing data.
Other
677 stars 209 forks source link

bamCoverage bin size is variable within a file and between files #1144

Open mheskett opened 2 years ago

mheskett commented 2 years ago

I'm calling bamcoverage on several bam files and specifying the binsize to 50,000. I expect every file to have identical coordinates (first three columns in bed file), otherwise I wont be able to compare them against each other. however after just a few lines I get variable bin sizes (between 50 and 300kb), and its different between to files that i paste together here


chr1    50000   150000  1       chr1    50000   100000  6
chr1    150000  200000  0       chr1    100000  200000  0
chr1    200000  250000  1       chr1    200000  300000  1
chr1    250000  500000  0       chr1    300000  600000  0
chr1    500000  550000  1       chr1    600000  650000  3```

**Welcome to deepTools GitHub repository! Before opening the issue please check
that the following requirements are met :**

 - [ x] Search whether this issue (or a similar issue) has been solved before
 using the search tab above. Link the previous issue if appropriate below.

 - [ x] Paste your deepTools version (`deeptools --version`) and your python
 version (`python --version`) below.
deeptools 3.5.1

 - [x ] Paste the full deepTools command that produces the issue below
 (ignore if you simply spotted the issue in the code/documentation).
bamCoverage --numberOfProcessors 4 -b $input  -o $outdir$filename.coverage.bw -of bedgraph --binSize 50000

- [ x] Paste the output printed on screen from the command that produces the issue
 below (ignore if you simply spotted the issue in the code/documentation).
```binLength: 50000
numberOfSamples: None
blackListFileName: None
skipZeroOverZero: False
bed_and_bin: False
genomeChunkSize: None
defaultFragmentLength: read length
numberOfProcessors: 4
verbose: False
region: None
bedFile: None
minMappingQuality: None
ignoreDuplicates: False
chrsToSkip: []
stepSize: 50000
center_read: False
samFlag_include: None
samFlag_exclude: None
minFragmentLength: 0
maxFragmentLength: 0
zerosToNans: False
smoothLength: None
save_data: False
out_file_for_raw_data: None
maxPairedFragmentLength: 1000```
LeilyR commented 2 years ago

If I am not wrong its default behaviour is to merge bins of no coverage that is why you got bins of different length. To skip the those regions you could use --skipNAs

mheskett commented 2 years ago

Hey thanks for the response @LeilyR . skipNAs wont help since i need all files across samples to have the same set of regions. Is there any way to force the binsize to remain the same, no merging, and report all regions with 0 coverage as well? If not I can just use BedTools against a set of genomic windows

Aannaw commented 2 years ago

Hey, Have you solved the problem? I have met the same problem I would much appreciated it if you could give any suggestions!

mheskett commented 2 years ago

Yeah you need to use bedtools intersect with sliding windows instead (see bedtools make windows)

On Sat, Jul 9, 2022 at 8:34 AM Aannaw @.***> wrote:

Hey, Have you solved the problem? I have met the same problem I would much appreciated it if you could give any suggestions!

— Reply to this email directly, view it on GitHub https://github.com/deeptools/deepTools/issues/1144#issuecomment-1179562829, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUDDNQ34QUWBV5ELJV7SODVTGLYVANCNFSM5ZAJLBFA . You are receiving this because you authored the thread.Message ID: @.***>