Closed weishwu closed 2 years ago
Hi @weishwu,
Can you please provide a small reproducible example I can use for debugging? I've not tried running methtuple myself with Python 3.8, but it is successfully builds on GitHub Actions with Python 3.8 (https://github.com/PeteHaitch/methtuple/runs/5136002839?check_suite_focus=true), so the unit tests are at least working as expected.
Cheers, Pete
Hi @PeteHaitch
test.bam.gz
My command line is:
methtuple --aligner Bismark -o methtuple_5554 --min-mapq 10 --min-base-qual 10 --sc --gzip -m 2 test.bam
Thanks!
Thanks! I've been able to reproduce the error running methtuple `v1.7.0 on my mac (albeit under Python 3.9.10).
# Download test file
wget https://github.com/PeteHaitch/methtuple/files/8329810/test.bam.gz
# Download is gzipped
% gunzip test.bam.gz
# Sort by read name
% samtools sort -n -o sorted_test.bam test.bam
# Run the command that reportedly triggers the error
% methtuple --aligner Bismark -o methtuple_5554 --min-mapq 10 --min-base-qual 10 --sc --gzip -m 2 sorted_test.bam
[E::idx_find_and_load] Could not retrieve index file for 'sorted_test.bam'
methtuple (v1.7.0)
Input SAM/BAM file = sorted_test.bam
Output file of CG 2-tuples = methtuple_5554.CG.2.tsv.gz
The counts will be collapsed across Watson and Crick strands and the strand of each 2-tuple will be recorded as '*'
Reads that fail to pass QC filters will be written to = methtuple_5554.reads_that_failed_QC.txt.gz
Assuming quality scores are Phred33
Assuming SAM/BAM file was created with Bismark version >= 0.8.3.
Ignoring improper read-pairs
Using all positions of each read (if data are single-end) or of each read_1 (if data are paired-end).
Using all positions of each read_2 (if data are paired-end).
Ignoring methylation calls with base-quality less than 10
Ignoring reads with mapQ less than 10
Ignoring any overlapping positions of paired-end reads where the XM-tags disagree and counting once the remaining overlapping positions.
Creating CG 2-tuples
WARNING: 2-tuples may still have intervening methylation loci (i.e. NIC > 0). Such 2-tuples generally occur in paired-end reads with non-overlapping mates but can also be caused by filtering methylation calls by base quality, read-position, etc. You may wish to post-hoc filter 2-tuples with NIC > 0.
Verified that the XR-, XG- and XM-tags are set for the first mapped read.
Traceback (most recent call last):
File "/usr/local/bin/methtuple", line 449, in <module>
methylation_m_tuples, n_methylation_loci_in_fragment, n_fragment_skipped_due_to_bad_overlap = extract_and_update_methylation_index_from_paired_end_reads(read_1, read_2, AlignmentFile, methylation_m_tuples, m, all_combinations, methylation_type, methylation_pattern, ignore_read_1_pos, ignore_read_2_pos, min_qual, phred_offset, ob_strand_offset, overlap_filter, n_fragment_skipped_due_to_bad_overlap, FAILED_QC)
File "/usr/local/lib/python3.9/site-packages/methtuple/funcs.py", line 258, in extract_and_update_methylation_index_from_paired_end_reads
methylation_index_1, methylation_index_2, fragment_skipped = process_overlap(read_1, read_2, methylation_index_1, methylation_index_2, overlap_filter, FAILED_QC)
File "/usr/local/lib/python3.9/site-packages/methtuple/funcs.py", line 444, in process_overlap
start_ol_1 = [idx for idx, value in enumerate(positions_1) if value >= start_ol and value is not None][0]
File "/usr/local/lib/python3.9/site-packages/methtuple/funcs.py", line 444, in <listcomp>
start_ol_1 = [idx for idx, value in enumerate(positions_1) if value >= start_ol and value is not None][0]
TypeError: '>=' not supported between instances of 'NoneType' and 'int'
Off the top of my head I'm not sure what is causing this, so it'll require some digging. Unfortunately I don't have time to do much more this week, but I'll endeavour to look into it more closely next week.
For completeness, would you please report the output of methtuple --version
.
methtuple (v1.7.0)
Thanks!
Hello,
unfortunately I run in almost the same error -
methtuple /misc/data/tmp/TUBO_S1_L001_R1_001_val_1_bismark_bt2_pe.bam
/misc/data/tmp/TUBO_S1_L001_R1_001_val_1_bismark_bt2_pe.bam
[E::idx_find_and_load] Could not retrieve index file for '/misc/data/tmp/TUBO_S1_L001_R1_001_val_1_bismark_bt2_pe.bam'
methtuple (v1.7.0)
Input SAM/BAM file = /misc/data/tmp/TUBO_S1_L001_R1_001_val_1_bismark_bt2_pe.bam
Output file of CG 1-tuples = /misc/data/tmp/TUBO_S1_L001_R1_001_val_1_bismark_bt2_[pe.CG.1.tsv](http://pe.cg.1.tsv/)
Reads that fail to pass QC filters will be written to = /misc/data/tmp/TUBO_S1_L001_R1_001_val_1_bismark_bt2_pe.reads_that_failed_QC.txt
Assuming quality scores are Phred33
Assuming SAM/BAM file was created with Bismark version >= 0.8.3.
Ignoring improper read-pairs
Using all positions of each read (if data are single-end) or of each read_1 (if data are paired-end).
Using all positions of each read_2 (if data are paired-end).
Ignoring methylation calls with base-quality less than 0
Ignoring reads with mapQ less than 0
Ignoring any overlapping positions of paired-end reads where the XM-tags disagree and counting once the remaining overlapping positions.
Creating CG 1-tuples
Verified that the XR-, XG- and XM-tags are set for the first mapped read.
Traceback (most recent call last):
File "/misc/software/ngs/methtuple/v1.7.0/pyEnv/bin/./methtuple", line 449, in <module>
methylation_m_tuples, n_methylation_loci_in_fragment, n_fragment_skipped_due_to_bad_overlap = extract_and_update_methylation_index_from_paired_end_reads(read_1, read_2, AlignmentFile, methylation_m_tuples, m, all_combinations, methylation_type, methylation_pattern, ignore_read_1_pos, ignore_read_2_pos, min_qual, phred_offset, ob_strand_offset, overlap_filter, n_fragment_skipped_due_to_bad_overlap, FAILED_QC)
File "/misc/software/ngs/methtuple/v1.7.0/pyEnv/lib/python3.9/site-packages/methtuple/[funcs.py](http://funcs.py/)", line 258, in extract_and_update_methylation_index_from_paired_end_reads
methylation_index_1, methylation_index_2, fragment_skipped = process_overlap(read_1, read_2, methylation_index_1, methylation_index_2, overlap_filter, FAILED_QC)
File "/misc/software/ngs/methtuple/v1.7.0/pyEnv/lib/python3.9/site-packages/methtuple/[funcs.py](http://funcs.py/)", line 444, in process_overlap
start_ol_1 = [idx for idx, value in enumerate(positions_1) if value >= start_ol and value is not None][0]
File "/misc/software/ngs/methtuple/v1.7.0/pyEnv/lib/python3.9/site-packages/methtuple/[funcs.py](http://funcs.py/)", line 444, in <listcomp>
start_ol_1 = [idx for idx, value in enumerate(positions_1) if value >= start_ol and value is not None][0]
TypeError: '>=' not supported between instances of 'NoneType' and 'int'
--------------------------------
And my guess is the missing index file for the bam file generated by bismark. Methtuple requieres a bam file sorted by queryname and not by location, but samtools index is not capable indexing name sorted bam files. And I dont know how to solve this problem.
version = methtuple (v1.7.0)
Thank u and all the best,
Jan
I think I've found the problem and I'm preparing a new version that fixes (along with some other miscellaneous fixes). The issue is caused by reads with an insertion. Following the above example file supplied by @weishwu, this specific read is causing the error.
samtools view sorted_test.bam | grep "A00437:475:H7FK3DSX3:1:1101:1850:9987_1:N:0:GAATTCGT+AGGCTATA"
What happens is that when methtuple parses this read in process_overlap()
we run positions_1 = get_positions(read_1)
, which returns a None
for the insertion and we then try to do the value >= start_ol_1
comparison in:
https://github.com/PeteHaitch/methtuple/blob/7c800b3708843d297bd1b919ade9b163f8750fc9/methtuple/funcs.py#L444
and hit '>=' not supported between instances of 'NoneType' and 'int'
because value
is None
.
Simply reversing the order of the comparisons seems to fix the issue
# Old
value >= start_ol and value is not None
# New
value is not None and value >= start_ol
(although I've no idea why this only just now popped up as a problem)
@JaBa90 The [E::idx_find_and_load] Could not retrieve index file for ...
error (really a warning in the pysam context) is a red-herring and can safely be ignored.
Nonetheless, I have implemented the proposed workaround to the main function and to the test suite, so you should no longer see it after upgrading to v.1.8.0
(to be made available shortly).
methtuple v1.8.0
, which fixes this issue, is now available through PyPI: https://pypi.org/project/methtuple/
methtuple --aligner Bismark -o methtuple_5554 --min-mapq 10 --min-base-qual 10 --sc --gzip -m 2
@PeteHaitch I installed this version and tried it. It worked. Thanks a lot for your help!
Great to hear!
I installed methtuple with python 3.8.8. Got the following error when running it with a bam from bismark:
TypeError: '>=' not supported between instances of 'NoneType' and 'int'