Open koriege opened 8 months ago
Hi @koriege,
I appreciate the enthusiasm for the increased efficiency of Summit! To get back to your suggestions...
OMP_NUM_THREADS
really utilize your cores? Thanks for the quick reply.
- Since Summit was developed without parallel processing, does setting
OMP_NUM_THREADS
really utilize your cores?
No, it does not. But one of your libraries, that uses nthreads infers number of cores, which is 128 in my case. Thus setting OMP_NUM_THREADS
to anything <=64 avoids conflicts.
- This error means that a read was not designated as forward or reverse. Perhaps consider removing unmapped or reads without a mate from the BAM file before using Summit?
I did. I always work with unique alignments. The only flag which was removed from the BAM in some cases, due to region blacklisting, is the "proper_pair" flag, although both mates exist (one reverse, the other mate_reverse flagged). I will re-add the proper_pair flag, try again and report back. Thanks for pointing me into this direction.
- This error means that a read was not designated as forward or reverse. Perhaps consider removing unmapped or reads without a mate from the BAM file before using Summit?
After having extensively tested, it seems that your implementation does not work. The pysam object simply does not have an attribute is_forward. According to the pysam docs: is_forward - true if read is mapped to forward strand (implemented in terms of is_reverse)
see this tiny example
NB501242:295:HV7G2BGXV:4:11610:20759:10780 99 chr1 10074 9 4S27=1X4=1X3= = 10392 345 CCCTAACCCTAACCCTAACCCTAACCCTAACACTAAGCCA AAAAA/EEAEEAEAEEAEE<EEE/EEEE//E/EEAE//<< MD:Z:27C4C3 NH:i:1H
I:i:0 NM:i:2 UQ:i:28 YZ:Z:0 RG:Z:A1 MQ:i:6 MC:Z:4=1D22=5S
NB501242:295:HV7G2BGXV:4:11610:20759:10780 147 chr1 10392 6 4=1D22=5S = 10074 -345 CCTACCCCTAACCCTAACCCTAACCCTAACC E//E/EAEEAEEEEAEEEEE6EEEEEAAAAA MD:Z:4^A22 NH:i:1 HI:i:0 NM:i:1 UQ:i:0 YZ:Z:0R
G:Z:A1 MQ:i:9 MC:Z:4S27=1X4=1X3=
here r1 attributes are
['__class__', '__copy__', '__deepcopy__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__pyx_vta
ble__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', 'aend', 'alen', 'aligned_pairs', 'bin', 'blocks', 'cigar', 'cigarstring', 'cigartuples', 'compare', 'flag', 'from_dict', 'fromstring', 'get_aligned_pairs', 'get_blocks', 'get_cigar_stats', 'get_forward_qualities', 'get_forward_sequence', 'get_overlap', 'get_reference_positions', 'get_reference_sequence', 'get_tag', 'get_tags', 'has_tag', 'header', '
infer_query_length', 'infer_read_length', 'inferred_length', 'is_duplicate', 'is_paired', 'is_proper_pair', 'is_qcfail', 'is_read1', 'is_read2', 'is_reverse', 'is_secondary', 'is_supplementary', 'is_unmapped', 'isize', 'mapping_quality', '
mapq', 'mate_is_reverse', 'mate_is_unmapped', 'mpos', 'mrnm', 'next_reference_id', 'next_reference_name', 'next_reference_start', 'opt', 'overlap', 'pnext', 'pos', 'positions', 'qend', 'qlen', 'qname', 'qqual', 'qstart', 'qual', 'query', '
query_alignment_end', 'query_alignment_length', 'query_alignment_qualities', 'query_alignment_sequence', 'query_alignment_start', 'query_length', 'query_name', 'query_qualities', 'query_sequence', 'reference_end', 'reference_id', 'reference_length', 'reference_name', 'reference_start', 'rlen', 'rname', 'rnext', 'seq', 'setTag', 'set_tag', 'set_tags', 'tags', 'template_length', 'tid', 'tlen', 'to_dict', 'to_string', 'tostring']
So the solution is for line 178 and following
if not r1.is_reverse and r2.is_reverse:
r_start, r_end = r1.reference_start, r2.reference_end
elif r1.is_reverse and not r2.is_reverse:
r_start, r_end = r2.reference_start, r1.reference_end
else:
exit(f"Read pair R1 {r1} and R2 {r2} are not complementary! Either both are forward or reverse reads!")
Btw. is_reverse attribute is of type bool, whereas the &
operator is made for bitwise comparision. You may change it to and
Plase also consider to add a proper shebang line to your script.
#!/usr/bin/env python3
Hi,
I really appreciate your re-implementation, since the original version caused up to ~200Gb memory usage in my case. I know, that summit is really new, but I want to report some issues I stumbled across when trying it out. Regards.
1)
summit.py
->Error. nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (64)
Solution via
OMP_NUM_THREADS
variable2)
OMP_NUM_THREADS=28 summit.py -p 0.05
->summit.py error: argument -p/--pval: invalid int value: '0.05'
Solution: argparse type set to float
3)
OMP_NUM_THREADS=28 summit.py -c <bam> -b <bam> -o <out> -w 100 -m 500 -t 100 -l 50
Do I need to sort the alignments by name?