cancerit / BRASS

Breakpoints via assembly - Identifies breaks and attempts to assemble rearrangements in whole genome sequencing data.
GNU Affero General Public License v3.0
57 stars 20 forks source link

less *.ngscn.segments.abs_cn.bg | cut -f1 | sort -u 1 2 3 4 5 6 7 8 9 X Y #106

Open jsmedmar opened 2 years ago

jsmedmar commented 2 years ago

On v6.3.4 the following file is missing data for double digit chromosomes:

less *.ngscn.segments.abs_cn.bg | cut -f1 | sort -u
1
2
3
4
5
6
7
8
9
X
Y

Wondering if someone could check their output and see if its a problem unique to my setting. I'm working out of quay.io/wtsicgp/brass:v6.3.4.

Thank you so much in advance

jsmedmar commented 2 years ago

by the way, this file ends up in the intermediates gzip after completion.

keiranmraine commented 2 years ago

I can confirm that I can see this on GRCh37, it's fine on GRCh38 (with chr prefix):

$ zcat intermediates_GRCh37/*.ngscn.segments.abs_cn.bg.gz | cut -f 1 | sort -u
1
2
3
4
5
6
7
8
9
X
Y
$ zcat intermediates_GRCh38/*.ngscn.segments.abs_cn.bg.gz | cut -f 1 | sort -u
chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr20
chr21
chr22
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrX
chrY
keiranmraine commented 2 years ago

Assessed all bed/bedpe/bg files, only affects this one file (*.ngscn.segments.abs_cn.bg.gz).

$ ls -1 *.{bed,bg}* | xargs -I {} bash -c 'echo -n "{}: "; zgrep -v "^#" {} | cut -f 1 | sort -u | wc -l'
WT.ngscn.bed.gz: 24
WT.ngscn.fb_reads.bed.gz: 24
MT.ngscn.bed.gz: 24
MT.ngscn.fb_reads.bed.gz: 24
MT_vs_WT.groups.clean.bedpe: 13 # expected, cleaned data will not have events on every chr
MT_vs_WT.groups.filtered.bedpe: 23 # expected, cleaned data will not have events on every chr
MT_vs_WT.ngscn.abs_cn.bg.gz: 24
MT_vs_WT.ngscn.abs_cn.bg.rg_cns.gz: 24
MT_vs_WT.ngscn.segments.abs_cn.bg.gz: 11 # NOT expected

*.segments.abs_cn.bg and .abs_cn.bg are both in the same R script but only 1 of the files is affected. Needs someone with R skills to investigate:

https://github.com/cancerit/BRASS/blob/dd0e1c1324459c4090c598dc6a12b7b71ef34586/perl/share/Rscripts/normalise_cn_by_gc_and_fb_reads.R