deeptools / deepTools

Tools to process and analyze deep sequencing data.
Other
685 stars 213 forks source link

alignmentSieve returns a truncated bam #1180

Open sebastian-gregoricchio opened 1 year ago

sebastian-gregoricchio commented 1 year ago
Python 3.9.12
deeptools 3.5.1
alignmentSieve 3.5.1

Dear all, when I run alignmentSieve with ATACshift option, but not for all my samples, I get as output a truncated bam file.

For instance this is the flagstat of my original bam:

31138979 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
31138979 + 0 mapped (100.00% : N/A)
31138979 + 0 paired in sequencing
15587662 + 0 read1
15551317 + 0 read2
31138979 + 0 properly paired (100.00% : N/A)
31138979 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Then I run the following shifting:

alignmentSieve \
        --bam input.bam \
        --outFile shifted.bam \
        --minMappingQuality 20 \
        --minFragmentLength 0 \
        --maxFragmentLength 0 \
        --ATACshift \
        -p max

Then, a first thing is that the size of the file 2.4GB compared to the 5.4GB that I get without the shifting. (I do not get any error message from alignmentSieve)

Secondly when I try to sort the file I get the following error:

user$   samtools sort -@ 30 -o shifted_sorted.bam shifted.bam

[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
samtools sort: truncated file. Aborting

Furthermore if I try to run the flagstat on the resulting file I get that the file is indeed truncated with less than half of the read:

[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[bam_flagstat_core] Truncated file? Continue anyway.
1707936 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
1707936 + 0 mapped (100.00% : N/A)
1707936 + 0 paired in sequencing
854824 + 0 read1
853112 + 0 read2
1707936 + 0 properly paired (100.00% : N/A)
1707936 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Thank you in advance for your help!

buzhizhang121 commented 1 year ago

Hello, I have encountered exactly the same problem recently, have you solved it? Does anyone know how to solve this problem? Thank you for your help!

(Python 3.9.13, deeptools 3.5.1, alignmentSieve 3.5.1)

sjm0240111 commented 1 year ago

I also met a similar problem. When I finished running alignmentSieve, I tried to use samtools to sort the shifted bam file. Then it says: [e::bgzf_read_block] invalid bgzf header at offset 716786409 [e::bgzf_read] read block operation failed with error 6 after 0 of 4 bytes

Thank you for your help!

(deeptools 3.5.1 samtools 1.16.1 pysam 0.19.1)

fgualdr commented 1 year ago

Same problem here. Is there a way to solve this?

baishengjun commented 1 year ago

Same problem here. Is there a way to solve this?

WardDeb commented 1 year ago

Hi,

I was able to reproduce, though I'm a bit puzzled about the actual cause. The default chunksize for alignmentSieve has been increased, and this seemed to have fixed the problem. Could you try to reproduce the issue with the develop branch and see if the problem persists ?

Kind regards,

wardDeb

WardDeb commented 1 year ago

This didn't occur anymore with chunksize increases (release 3.5.2). I'll close this for now, but feel free to re-open if it pops back up.

Leo-ccc commented 7 months ago

Hi, I have the same problem even after updating the deeptools to 3.5.5

alignmentSieve --numberOfProcessors 40 --ATACshift --paired --bam POOL-2.marked.rmDup.rmMulti.bam -o POOL-2.tmp.bam Size of 'POOL-2.marked.rmDup.rmMulti.bam' is 2.6G and 'tmp.bam' is 512M

After the 'samtools sort' command, I got the errors: [E::bgzf_read_block] Invalid BGZF header at offset 380755101 [E::bgzf_read] Read block operation failed with error 6 after 0 of 4 bytes samtools sort: truncated file. Aborting

Do you have any suggestions?

WardDeb commented 7 months ago

This is not an issue with your tmp directory ? 512MB is an oddly specific size, otherwise, would you mind sharing (somehow) the bam file that causes this behavior ?

Leo-ccc commented 7 months ago

I found that 'tmp.bam' file had fewer reads.

The tail of 'tmp.bam' seems strange. Is that a problem?

samtools view POOL-2.tmp.bam | tail -n 20 [E::bgzf_read_block] Invalid BGZF header at offset 380755098 [E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes [main_samview] truncated file. samtools view: error closing "POOL-2.tmp.bam": -1 E200017923L1C035R0140471589 163 chr12 106999536 40 57M = 106999532 -52 E200017923L1C035R0140471589 83 chr12 106999532 40 56M = 106999536 -52 E200017923L1C024R0224092311 99 chr12 106999546 40 38M = 106999542 -33 E200017923L1C024R0224092311 147 chr12 106999542 40 37M = 106999546 -33 E200017923L1C023R0131256471 163 chr12 106999554 39 55M = 106999550 -50 E200017923L1C023R0131256471 83 chr12 106999550 39 54M = 106999554 -50 E200017923L1C032R0413648952 99 chr12 106999604 44 62M = 106999600 -57 E200017923L1C032R0413648952 147 chr12 106999600 44 61M = 106999604 -57 E200017923L1C024R0032428308 99 chr12 106999661 44 70M = 106999657 -65 E200017923L1C024R0032428308 147 chr12 106999657 44 69M = 106999661 -65 E200017923L1C037R0260351190 163 chr12 106999720 44 35M = 106999716 -30 E200017923L1C037R0260351190 83 chr12 106999716 44 34M = 106999720 -30 E200017923L1C013R0123664916 163 chr12 106999827 44 146M = 106999856 174 E200017923L1C013R0123664916 83 chr12 106999856 44 145M = 106999827 -174 E200017923L1C003R0151229789 163 chr12 106999888 44 72M = 106999884 -67 E200017923L1C003R0151229789 83 chr12 106999884 44 71M = 106999888 -67 E200017923L1C012R0261817120 163 chr12 107000003 44 50M = 106999999 -45 E200017923L1C039R0054857891 163 chr12 107000003 44 70M = 106999999 -108 E200017923L1C012R0261817120 83 chr12 106999999 44 49M = 107000003 -45 E200017923L1C039R0054857891 83 chr12 106999999 44 112M = 107000003 -108

ultimatex5 commented 7 months ago

Hi,

same problem here.. I've tried both 3.5.2 and 3.5.5 versions of deeptools. if I remove --ATACshift all is good otherwise I get truncated Bam (all Bams has size of 30.251Kb and are empty).

Just tried also 3.5.3 but same problem.

Tried also 3.5.1 same problem (empty Bam files) but this time the size is different 213.746Kb.

Leo-ccc commented 7 months ago

Hi. Another information is that 5 of 6 my sequencing BAM files ran successfully. Only 1 file faild. I think it is not a general problem but a specific one. I'd like to provide my BAM to you if you need one to test. Thank you!

WardDeb commented 7 months ago

Thanks for the additional info, it'd be great if you could share the problematic and a working bam file somehow. Do you have a way of making these available ?

WardDeb commented 7 months ago

Just per update, I've received the files, this is now work in progress.

Zeyu618 commented 5 months ago

I get the non-truncated bam when using the latest alignmentSieve(3.5.5) on my first try on the sample that I always got truncated err regardless of how many times I've re-ran the alignmentSieve of the old version. Update your alignmentSieve to 3.5.5 now!

WardDeb commented 5 months ago

It seems the chance for a truncated bam in 3.5.5 is decreased, but not completely removed. I'm aiming to have a true fix for this in the upcoming weeks.

li1311139481 commented 5 months ago

Hi. Another information is that 5 of 6 my sequencing BAM files ran successfully. Only 1 file faild. I think it is not a general problem but a specific one. I'd like to provide my BAM to you if you need one to test. Thank you!

same problem. and I also tried to reprocess the failed files individually, and the results still failed, which I think is probably not an accident, but a specific file format problem

li1311139481 commented 5 months ago

I use conda install deeptools=3.5.5. It still doesn't solve the problem. But I solved the problem by creating a new environment using conda and then installing deeptools using pip. You can try installing with pip for now

sunta3iouxos commented 1 month ago

hi there, just wanted to add some information on the issue: I split the file in chromosomes with:

while read p; do   samtools view -o "/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_"$p".bam" /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.filtered.bam  ` echo $p | sed  's/^/chr/'` ; done < chr.txt

then: indexing those files:

 for bam in /scratch/Theo/AP06/bam/filtered_bam/*.test_*; do   samtools index -@ 16 $bam; done

ATAC shift

for bam in /scratch/Theo/AP06/bam/filtered_bam/*.test_*bam; do alignmentSieve --bam $bam --outFile ${bam%.bam}"_Stmp.bam" --ATACshift --numberOfProcessors 16; done

and here the sorting:

 for bam in /scratch/Theo/AP06/bam/filtered_bam/*_Stmp.*bam; do samtools sort -@ 16 -O Bam -o ${bam%_Stmp.bam}"_shift.bam" $bam; done
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes
samtools sort: truncated file. Aborting
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes
samtools sort: truncated file. Aborting

Those are the bam affected, chr11,chrX. I do not know if this is a random event:

ls /scratch/Theo/AP06/bam/filtered_bam/*_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_10_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_2_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_12_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_3_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_13_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_4_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_14_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_5_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_15_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_6_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_16_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_7_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_17_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_8_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_18_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_9_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_19_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_M_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_1_shift.bam   /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_Y_shift.bam