cancerit / BRASS

Breakpoints via assembly - Identifies breaks and attempts to assemble rearrangements in whole genome sequencing data.
GNU Affero General Public License v3.0
57 stars 20 forks source link

getting POS of 0 #110

Open anoronh4 opened 2 years ago

anoronh4 commented 2 years ago

I ran brass on several samples and in one sample i got the following record:

11  0   4897_2  0   0]11:109403548] 4   .   SVTYPE=BND;MATEID=4897_1;TSRDS=A00227:491:HGLM2DSX3:4:1619:32814:17581,A00333:460:HFNH3DSX3:3:1246:9498:24643,A00333:460:HFNH3DSX3:3:1357:18231:27962,A00333:460:HFNH3DSX3:4:1557:7997:8672;BKDIST=109403549;SVCLASS=inversion  RC:PS   0:0 0:4

position 0 looks pretty unusual to me and i was not able to validate the call with any of 3 other callers (delly, manta, svaba). this is causing issues for me because certain downstream tools will not handle a position of 0 (pysam for example).

anoronh4 commented 2 years ago

Hope all is well. Just wondering if you need any further information/files from me to answer this question. By the way I'm using the quay.io/wtsicgp/brass:v6.3.4 container. As per the specification, VCF files should contain 1-based coordinates: https://samtools.github.io/hts-specs/VCFv4.3.pdf

anoronh4 commented 2 years ago

just wondering if there was any update on this.

anoronh4 commented 1 year ago

i have run into this issue again with a different sample:

$ zgrep -v "^#" /path/to/brass/sample.annot.vcf.gz | awk -F"\t" '$2 == 0'
3   0   5631_2  0   [3:80865736[0   5   .   SVTYPE=BND;MATEID=5631_1;TSRDS=A00227:541:HM73TDSX3:2:1321:30101:27132,A00227:541:HM73TDSX3:3:1137:14832:13025,A00227:541:HM73TDSX3:3:1163:23972:14669,A00227:541:HM73TDSX3:4:2455:30120:12540,A00333:514:HLYMTDSX3:2:1409:8630:9455;BKDIST=80865737;SVCLASS=inversion  RC:PS   0:0:5
3   0   5695_2  0   [3:80870254[0   8   .   SVTYPE=BND;MATEID=5695_1;TSRDS=A00227:541:HM73TDSX3:1:1328:22155:15562,A00227:541:HM73TDSX3:1:2469:27588:14857,A00227:541:HM73TDSX3:2:1608:9751:25708,A00227:541:HM73TDSX3:2:1647:23728:25739,A00227:541:HM73TDSX3:2:2305:20654:23077,A00227:541:HM73TDSX3:3:2203:7726:14904,A00333:514:HLYMTDSX3:2:1433:9932:16313,A00333:514:HLYMTDSX3:2:2636:32958:14575;BKDIST=80870255;SVCLASS=inversion   RC:PS   0:0 0:8
3   0   6174_2  0   0]3:80911681]   4   .   SVTYPE=BND;MATEID=6174_1;TSRDS=A00227:541:HM73TDSX3:1:1477:31159:22138,A00227:541:HM73TDSX3:1:2540:17987:9596,A00227:541:HM73TDSX3:4:1518:21640:27101,A00227:541:HM73TDSX3:4:2214:22019:20870;BKDIST=80911682;SVCLASS=inversion RC:PS   0:0 0:4

This time, the issue appears 3 times in the same output vcf. Does your team plan to look into this issue at all?

interestingly, when i searched the brm.bam file for the first read mentioned in those vcf lines, it does not reconcile with the vcf:

$ samtools view /path/to/brass/tmpBrass/sample.brm.bam | grep A00227:541:HM73TDSX3:2:1321:30101:27132
A00227:541:HM73TDSX3:2:1321:30101:27132 145 3   80865728    60  39S112M =   80866798    960 AGAGCGATCAGGCAAGAGAAGAAATGAAGGGCATTCAAGCAGGAATACCTCAACCCCTCCAACACAGCAGGCACAGCAGCTTCCCAAATTCGAGGGGACGGAGAACAAAGCTGGGGGCCTGTTACCAGTCCCCAAGAGTTAGAGTCCACAG 1<:=5<+<=;<==;;=;=;;=8;77=;;==:=:=<=;;==;==;:(:=><=98=88>9=8,;>;>1:>1>>>;><>>=?@?>???==<?:7?=??8?<7>?=;<;,<<<>?=>>98;>?==:<;>=;<<====;2=<<<::<::::<9;98 SA:Z:3,80866832,-,45M106S,60,0; MC:Z:79M72S MD:Z:112    PG:Z:MarkDuplicateRG:Z:TA00227@541@HM73TDSX3@2  NM:i:0  AS:i:112    XS:i:33
A00227:541:HM73TDSX3:2:1321:30101:27132 97  3   80866798    60  79M72S  =   80865728    -960    ACTCCTATTCAACACAGTACTAGAAGTCCTGGCCAGAGCGATCAGGCAAGAGAAGAAATGAAGGGCATTCAAGCAGGAATACCTCAACCCCTCCAACACAGCAGGCACAGCAGCTTCCCAAATTCGAGGGGACGGAGAACAAAGCTGGGGG ;<<??688:=;=<<<<?;<=<<?=>?;>><>>>>=?=?>6=;>=?>?>:@>@>?@>:?<?>?@???><=?>?@?>@?>?<==?=?=?>???=??=?=>=>@?>@??>>>@?>@?:=???>?8;;>6>?>>>==6>=?=8==*??><??>=; SA:Z:3,80865728,+,73S78M,60,0;  MC:Z:39S112M    MD:Z:79 PG:Z:MarkDuplicates RG:Z:TA00227@541@HM73TDSX3@2    NM:i:0  AS:i:79 XS:i:31
tischfis commented 1 year ago

I am having similar issue, any update BRASS team? thanks!

rulixxx commented 1 year ago

Would need some small test input to try to reproduce.

anoronh4 commented 1 year ago

thanks for your response. i will need some time to come up with a small test input and make sure it can reproduce the issue. is there a way to run brass on a bam that has been filtered by region?

in the meantime, i have done a little more work to isolate the problem, finding the source of the invalid coordinate in the .r4 intermediate file:

$ cat brass/tumor_vs_normal.r4 | awk -F"\t" '$12 == 0' 
3   80865824    80865850    3   80867060    80867071    5631    5   -   +   80865736    0   80865736 (0)    80866775 (0)    0 (0)   0 (0)   A00227:541:HM73TDSX3:2:2333:25129:13479,A00227:541:HM73TDSX3:3:2636:13286:8218,A00227:541:HM73TDSX3:4:2252:20066:33207,A00333:514:HLYMTDSX3:2:1106:31195:31657,A00333:514:HLYMTDSX3:2:1116:29740:36933,A00333:514:HLYMTDSX3:2:1409:8630:9455,A00333:514:HLYMTDSX3:2:1657:3875:9392,A00333:514:HLYMTDSX3:2:2232:2311:15452   8   1       0   0
3   80869882    80870017    3   80870790    80870983    5695    8   -   +   80870254    0   80870254 (2)    80870111 (2)    0 (0)   0 (0)   A00227:541:HM73TDSX3:2:2464:15980:14700,A00227:541:HM73TDSX3:3:1411:26874:24455,A00227:541:HM73TDSX3:3:2366:11333:22811,A00227:541:HM73TDSX3:3:2640:26503:3552,A00227:541:HM73TDSX3:4:1336:12292:19366,A00227:541:HM73TDSX3:4:1650:16188:34068,A00227:541:HM73TDSX3:4:1659:11234:11303,A00227:541:HM73TDSX3:4:2158:5367:6715,A00227:541:HM73TDSX3:4:2515:3821:20572,A00227:541:HM73TDSX3:4:2515:6451:20556,A00227:541:HM73TDSX3:4:2643:20383:3443,A00333:514:HLYMTDSX3:2:1506:30427:16047,A00333:514:HLYMTDSX3:2:1616:18059:12790,A00333:514:HLYMTDSX3:2:1646:27154:33364   14  2       0   0
3   80911970    80911978    3   80913732    80913783    6174    4   +   -   80911681    0   80911624 (2)    80911681 (2)    0 (0)   0 (0)   A00227:541:HM73TDSX3:1:2404:15302:31093,A00227:541:HM73TDSX3:3:1428:22417:22091 2   0

The 12th column of this file represents the "high_end_bkpt" and is calculated using the following lines: https://github.com/cancerit/BRASS/blob/7883d8cd3db614be2b06e32c288e7db344d1f2b0/perl/bin/get_abs_bkpts_from_clipped_reads.pl#L509-L514
seems to me that this should not evaluate to 0, or should get lead to the variant being filtered out?

This column is passed on to .r5 and .r6, and then gets added to .groups.clean.bedpe with the following lines: https://github.com/cancerit/BRASS/blob/7883d8cd3db614be2b06e32c288e7db344d1f2b0/perl/bin/collate_rg_regions.pl#L72-L73

Does this help identify the source of the problem?

rulixxx commented 1 year ago

Hi,

To tell you the truth I haven't previously worked with BRASS. However since I didn't see anyone else picking up the issue I decided to give it a try. That is why asked for the test otherwise it might take me too long to figure out.


From: anoronh4 @.> Sent: Tuesday, October 4, 2022 9:10 PM To: cancerit/BRASS @.> Cc: Raul Alcantara @.>; Comment @.> Subject: Re: [cancerit/BRASS] getting POS of 0 (Issue #110)

thanks for your response. i will need some time to come up with a small test input and make sure it can reproduce the issue. is there a way to run brass on a bam that has been filtered by region?

in the meantime, i have done a little more work to isolate the problem, finding an issue in the .r4 intermediate file:

$ cat brass/tumor_vs_normal.r4 | awk -F"\t" '$12 == 0' 3 80865824 80865850 3 80867060 80867071 5631 5 - + 80865736 0 80865736 (0) 80866775 (0) 0 (0) 0 (0) A00227:541:HM73TDSX3:2:2333:25129:13479,A00227:541:HM73TDSX3:3:2636:13286:8218,A00227:541:HM73TDSX3:4:2252:20066:33207,A00333:514:HLYMTDSX3:2:1106:31195:31657,A00333:514:HLYMTDSX3:2:1116:29740:36933,A00333:514:HLYMTDSX3:2:1409:8630:9455,A00333:514:HLYMTDSX3:2:1657:3875:9392,A00333:514:HLYMTDSX3:2:2232:2311:15452 8 1 0 0 3 80869882 80870017 3 80870790 80870983 5695 8 - + 80870254 0 80870254 (2) 80870111 (2) 0 (0) 0 (0) A00227:541:HM73TDSX3:2:2464:15980:14700,A00227:541:HM73TDSX3:3:1411:26874:24455,A00227:541:HM73TDSX3:3:2366:11333:22811,A00227:541:HM73TDSX3:3:2640:26503:3552,A00227:541:HM73TDSX3:4:1336:12292:19366,A00227:541:HM73TDSX3:4:1650:16188:34068,A00227:541:HM73TDSX3:4:1659:11234:11303,A00227:541:HM73TDSX3:4:2158:5367:6715,A00227:541:HM73TDSX3:4:2515:3821:20572,A00227:541:HM73TDSX3:4:2515:6451:20556,A00227:541:HM73TDSX3:4:2643:20383:3443,A00333:514:HLYMTDSX3:2:1506:30427:16047,A00333:514:HLYMTDSX3:2:1616:18059:12790,A00333:514:HLYMTDSX3:2:1646:27154:33364 14 2 0 0 3 80911970 80911978 3 80913732 80913783 6174 4 + - 80911681 0 80911624 (2) 80911681 (2) 0 (0) 0 (0) A00227:541:HM73TDSX3:1:2404:15302:31093,A00227:541:HM73TDSX3:3:1428:22417:22091 2 0

The 12th column of this file represents the "high_end_bkpt" and is calculated using the following lines: https://github.com/cancerit/BRASS/blob/7883d8cd3db614be2b06e32c288e7db344d1f2b0/perl/bin/get_abs_bkpts_from_clipped_reads.pl#L509-L514 . seems to me that this should not evaluate to 0, or should get lead to the variant being filtered out?

This value is passed to .r5 and .r6, and then gets added to .groups.clean.bedpe with the following lines: https://github.com/cancerit/BRASS/blob/7883d8cd3db614be2b06e32c288e7db344d1f2b0/perl/bin/collate_rg_regions.pl#L72-L73

Does this help identify the source of the problem?

— Reply to this email directly, view it on GitHubhttps://github.com/cancerit/BRASS/issues/110#issuecomment-1267522008, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEAAUCERBILLI7XLJTUGBKDWBSFJXANCNFSM5WC27X2A. You are receiving this because you commented.Message ID: @.***>