AmpliconSuite / AmpliconSuite-pipeline

A quickstart tool for AmpliconArchitect. Performs all preliminary steps (alignment, CNV calling, seed interval detection) required prior to running AmpliconArchitect. Previously called PrepareAA.
Other
48 stars 25 forks source link

[AmpliconArchitect] MD tag not present #12

Closed zhang919 closed 3 years ago

zhang919 commented 3 years ago

Dear, I am so gald to hear that you have upgrade AmpliconArchitect to version 1.1 and brings considerable performance improvements. Thanks for your work, but after try it, I find Its behavior seems to be inconsistent with the old version sometimes, for example, I meet the 'MD tag not present' error on process my bam with version 1.1 but the old version works well. Best. Zhang

jluebeck commented 3 years ago

Hi Zhang,

Thanks for describing this issue. Could you please post the output or logfile you received from version 1.1 containing the message you are referring to? Is this a crash or just a warning message?

Best regards, Jens

zhang919 commented 3 years ago

Hi jluebeck, Thanks for your reply, I found that because I re-run the old version of AmpliconArchitect, the log file has been overwritten, I will repeat this problem and share the log in a few days later. In my view, I think its a carsh, but not a warning, beacuse the program is stop after this message actually although no abnormal exit status code is returned. Maybe this error be catched by Internal exception catch mechanism and not throw again. Best. Zhang

zhang919 commented 3 years ago

Dear, I can showing following error messages about new AA, now. please review it and give me more infomation.

Traceback (most recent call last): File "/work/fu/bioinfo/AmpliconArchitect/src/AmpliconArchitect.py", line 199, in ilist = bamFileb2b.interval_hops(ird, rdlist=all_ilist) File "/work/fu/bioinfo/AmpliconArchitect/src/bam_to_breakpoint.py", line 1690, in interval_hops icn = self.interval_neighbors(ic, clist, rdlist=rdlist, gcc=gcc) File "/work/fu/bioinfo/AmpliconArchitect/src/bam_to_breakpoint.py", line 1619, in interval_neighbors
edges = self.interval_discordant_edges(i2, ms=msrlist) File "/work/fu/bioinfo/AmpliconArchitect/src/bam_to_breakpoint.py", line 1341, in interval_discordant_edges if self.edge_passes_filters(vl, bre): File "/work/fu/bioinfo/AmpliconArchitect/src/bam_to_breakpoint.py", line 1012, in edge_passes_filters if self.edge_has_high_mapq(read_list) and self.edge_has_high_entropy(read_list): File "/work/fu/bioinfo/AmpliconArchitect/src/bam_to_breakpoint.py", line 1002, in edge_has_high_entropy bp2_entropy = max([stats.entropy(np.unique(list(rr[1].get_reference_sequence()), return_counts=True)[1]) for rr in read_list]) File "pysam/libcalignedsegment.pyx", line 1833, in pysam.libcalignedsegment.AlignedSegment.get_reference_sequence File "pysam/libcalignedsegment.pyx", line 864, in pysam.libcalignedsegment.build_reference_sequence ValueError: MD tag not present Completed

tail *.log

DEBUG:root:checking foldback2: chr1:151246211+chr1:151246211+ 1 1 356 0 19 DEBUG:root:#TIME 5471.383 edge_breakpoint_filter: chr1:151246209+->chr1:151245644+ DEBUG:root:#TIME 5471.383 breakpoint_mapq: 60 60 DEBUG:root:#TIME 5471.401 breakpoint_entropy: 1.658 1.662 DEBUG:root:#TIME 5471.566 edge_breakpoint_filter: chr1:151245644+->chr1:151246209+ DEBUG:root:#TIME 5471.566 breakpoint_mapq: 60 60 DEBUG:root:#TIME 5471.583 breakpoint_entropy: 1.658 1.662 DEBUG:root:#TIME 5472.515 refine discordant edge found chr1:151419180+->chr1:151244471- 10 1 1 DEBUG:root:#TIME 5472.515 edge_breakpoint_filter: chr1:151419180+->chr1:151244471- DEBUG:root:#TIME 5472.515 breakpoint_mapq: 60 60 DEBUG:root:#TIME 5472.526 breakpoint_entropy: 1.536 1.549 DEBUG:root:#TIME 5474.511 edge_breakpoint_filter: chr1:151290318+->chr1:151287718- DEBUG:root:#TIME 5474.511 breakpoint_mapq: 60 60 DEBUG:root:#TIME 5474.513 breakpoint_entropy: 1.462 1.604 DEBUG:root:#TIME 5479.704 refine discordant edge found chr1:150719293-->chr1:150718974+ 16 54 55 DEBUG:root:#TIME 5480.971 edge_breakpoint_filter: chr1:150435738-->chr1:150435814+ DEBUG:root:#TIME 5480.971 breakpoint_mapq: 60 60 DEBUG:root:#TIME 5480.972 breakpoint_entropy: 1.380 1.398 DEBUG:root:#TIME 5482.273 edge_breakpoint_filter: chr1:151287718-->chr1:151290318+ DEBUG:root:#TIME 5482.273 breakpoint_mapq: 60 60 DEBUG:root:#TIME 5482.277 breakpoint_entropy: 1.604 1.462 DEBUG:root:checking foldback2: chr1:150601894-chr1:150601894- -1 -1 102 0 20 DEBUG:root:#TIME 5483.085 refine discordant edge found chr1:150602008-->chr1:150601916- 21 1 2 DEBUG:root:#TIME 5483.087 edge_breakpoint_filter: chr1:150602008-->chr1:150601916- DEBUG:root:#TIME 5483.087 breakpoint_mapq: 60 70

Please give a email if you need a full log record file. Best. Zhang

jluebeck commented 3 years ago

Hi Zhang,

Thanks for reporting this issue. I have another user as well who described this same problem when using BAM files aligned with Isaac. I will be patching this in the next few days, however the most immediate fix until then is to only use BAM files created with BWA MEM.

Best regards, Jens

On Mon, Oct 19, 2020, 3:19 AM zhang919 notifications@github.com wrote:

Dear, I can showing following error messages about new AA, please review it and give me more infomation. Traceback (most recent call last): File "/work/fu/bioinfo/AmpliconArchitect/src/AmpliconArchitect.py", line 199, in

ilist = bamFileb2b.interval_hops(ird, rdlist=all_ilist) File "/work/fu/bioinfo/AmpliconArchitect/src/bam_to_breakpoint.py", line 1690, in interval_hops icn = self.interval_neighbors(ic, clist, rdlist=rdlist, gcc=gcc) File "/work/fu/bioinfo/AmpliconArchitect/src/bam_to_breakpoint.py", line 1619, in interval_neighbors edges = self.interval_discordant_edges(i2, ms=msrlist) File "/work/fu/bioinfo/AmpliconArchitect/src/bam_to_breakpoint.py", line 1341, in interval_discordant_edges if self.edge_passes_filters(vl, bre): File "/work/fu/bioinfo/AmpliconArchitect/src/bam_to_breakpoint.py", line 1012, in edge_passes_filters if self.edge_has_high_mapq(read_list) and self.edge_has_high_entropy(read_list): File "/work/fu/bioinfo/AmpliconArchitect/src/bam_to_breakpoint.py", line 1002, in edge_has_high_entropy bp2_entropy = max([stats.entropy(np.unique(list(rr[1].get_reference_sequence()), return_counts=True)[1]) for rr in read_list]) File "pysam/libcalignedsegment.pyx", line 1833, in pysam.libcalignedsegment.AlignedSegment.get_reference_sequence File "pysam/libcalignedsegment.pyx", line 864, in pysam.libcalignedsegment.build_reference_sequence ValueError: MD tag not present Completed tail *.log DEBUG:root:checking foldback2: chr1:151246211+chr1:151246211+ 1 1 356 0 19 DEBUG:root:#TIME 5471.383 edge_breakpoint_filter: chr1:151246209+->chr1:151245644+ DEBUG:root:#TIME 5471.383 breakpoint_mapq: 60 60 DEBUG:root:#TIME 5471.401 breakpoint_entropy: 1.658 1.662 DEBUG:root:#TIME 5471.566 edge_breakpoint_filter: chr1:151245644+->chr1:151246209+ DEBUG:root:#TIME 5471.566 breakpoint_mapq: 60 60 DEBUG:root:#TIME 5471.583 breakpoint_entropy: 1.658 1.662 DEBUG:root:#TIME 5472.515 refine discordant edge found chr1:151419180+->chr1:151244471- 10 1 1 DEBUG:root:#TIME 5472.515 edge_breakpoint_filter: chr1:151419180+->chr1:151244471- DEBUG:root:#TIME 5472.515 breakpoint_mapq: 60 60 DEBUG:root:#TIME 5472.526 breakpoint_entropy: 1.536 1.549 DEBUG:root:#TIME 5474.511 edge_breakpoint_filter: chr1:151290318+->chr1:151287718- DEBUG:root:#TIME 5474.511 breakpoint_mapq: 60 60 DEBUG:root:#TIME 5474.513 breakpoint_entropy: 1.462 1.604 DEBUG:root:#TIME 5479.704 refine discordant edge found chr1:150719293-->chr1:150718974+ 16 54 55 DEBUG:root:#TIME 5480.971 edge_breakpoint_filter: chr1:150435738-->chr1:150435814+ DEBUG:root:#TIME 5480.971 breakpoint_mapq: 60 60 DEBUG:root:#TIME 5480.972 breakpoint_entropy: 1.380 1.398 DEBUG:root:#TIME 5482.273 edge_breakpoint_filter: chr1:151287718-->chr1:151290318+ DEBUG:root:#TIME 5482.273 breakpoint_mapq: 60 60 DEBUG:root:#TIME 5482.277 breakpoint_entropy: 1.604 1.462 DEBUG:root:checking foldback2: chr1:150601894-chr1:150601894- -1 -1 102 0 20 DEBUG:root:#TIME 5483.085 refine discordant edge found chr1:150602008-->chr1:150601916- 21 1 2 DEBUG:root:#TIME 5483.087 edge_breakpoint_filter: chr1:150602008-->chr1:150601916- DEBUG:root:#TIME 5483.087 breakpoint_mapq: 60 70 Please give me your email if you need a full log record file. Best. Zhang — You are receiving this because you commented. Reply to this email directly, view it on GitHub , or unsubscribe .
zhang919 commented 3 years ago

Hi jluebeck, Thanks for your attention, But I think it is worth noting that my files are indeed outputted by BWA MEM and have gone through the GATK standard correction process. However, what is interesting is that errors do not always occur for the files from the same process. Best. Zhang

jluebeck commented 3 years ago

Hi Zhang,

Thank you for clarifying that. That is very interesting, it is possible the GATK correction process alters the BAM file in some way, removing the MD tag. The other possibility is that if these are CRAM files, the MD tag can also be lost during that process as well. I do not have enough experience with the GATK toolset to say exactly, sorry. Perhaps try without the GATK pipeline? Regardless, patching this issue is high on my to-do list.

Best, Jens

On Mon, Oct 19, 2020 at 8:15 PM zhang919 notifications@github.com wrote:

Hi jluebeck, Thanks for your attention, But I think it is worth noting that my files are indeed outputted by BWA MEM and have gone through the GATK standard correction process. However, what is interesting is that file errors do not always occur for the same process. Best. Zhang

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jluebeck/PrepareAA/issues/12#issuecomment-712561417, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADM3Q4ZZELV5SQHKS572FC3SLT6FXANCNFSM4STCJ2TA .

zhang919 commented 3 years ago

Hi jluebeck, Thanks for your help and outstanding work again. Look forward more good news from your work. Best. Zhang

jluebeck commented 3 years ago

Hi Zhang, I have updated my fork of AA to address this issue. My docker image is also updated. Please re-open and let me know if you run into any issues.

Best, Jens