bicseq_preprocess.sh - Githubissues

hayeon-pak commented 2 years ago

안녕하세요.

InfoGenomeR 이용함에 있어, 어려움이 있어 질문 사항 남깁니다. preprocessing 과정을 실행하는데 있어서

This is mgcv 1.8-38. For overview type 'help("mgcv-package")'. No such file or directory: /home/to/fasta/folder/hg19.fa.1

위와 같은 오류가 발생하는데요. 이미 다운 받아 놓았던 hg19.fa 를 이용하면 될거라고 생각했는데. reference에 chromosome이 각각 나뉜 fasta file을 해당 이름에 맞추어 변경시켜 이용해야 하는 건가요?

답변 주시면 감사하겠습니다 :)

dmcblab commented 2 years ago

안녕하세요,

네 맞습니다. hg19.fa를 사용한다면, chromosome 별로 hg19.fa.1, hg19.fa.2, ... hg19.fa.22, hg19.fa.X 파일이 있어야 됩니다. chr prefix가 있는 hg19.fa를 각 chormosome 별로 나누는 코드는 다음 perl script를 참고해 주세요.

use Bio::DB::Fasta;
use warnings;

my $db = Bio::DB::Fasta->new("./hg19.fa");

for(my $chrs=0; $chrs<=22; $chrs++){
        my $chr_name;
        if($chrs != 22){
                $chr_name=$chrs+1;
        }else{
                $chr_name="X";
        }

        my $new_fa = $db->seq("chr${chr_name}");
        open(my $new_f, '>', "hg19.fa.${chr_name}");

        print $new_f ">${chr_name}\n";
        while( my $chunk = substr($new_fa, 0, 80, "")){
                  print $new_f "$chunk\n";
        }
}

추가로, 현재 preprocessing code가 bam 파일과 reference fasta에서 chr prefix가 없는 기준으로 작성되어 있어서 bam 파일과 fasta에 chr prefix가 있다면 에러가 발생할 수 있을 것 같습니다. 곧 업데이트 할 예정입니다.

감사합니다.

hayeon-pak commented 2 years ago

답변 감사합니다!

그런데, 한 가지 더 문의 드릴 사항이 있습니다. 저의 경우에 germline 분석을 진행하고 있습니다.

InfoGenomeR가 tumor 기반으로 guide가 작성되어 있기 때문에, 저의 상황에 맞게 code를 바꿔 이용하고 있는데요. novoBreak 이용에 문제가 발생하고 있습니다.

해당 tool에서 germline 분석 시에 normal bam file의 경우, mocked normal bam file by simulation 을 만들어서 이용해야 하는 것으로 알고 있습니다. 여기서, mocked normal bam file by simulation 이 무엇을 의미하는지, 혹시 germline 의 상황에서는 어떻게 이용하셨는지 알 수 있을까요?

감사합니다 :)

dmcblab commented 2 years ago

novobreak를 germline 분석 시 사용하는 경우, normal bam file을 임의로 reference fasta로부터 생성시켜서 사용할 수 있습니다. 아래 예시로 art tool을 사용하여 simulated reads를 생성 한 후 normal bam을 만들고 germline bam을 test로 사용하면 됩니다.

art_illumina -ss HS20 -i ./hg19.fa -p -l 100  -f 1  -m 350 -s 20 -o hg19
bwa-0.7.15 mem  -t 24 hg19.fa hg191.fq hg192.fq | samtools view -bS >  hg19.bam
samtools sort hg19.bam > hg19_sorted.bam
samtools index hg19_sorted.bam

다만, novobreak가 germline variant를 찾는 데 별로 좋지 않습니다. delly2 와 manta만 이용하셔도 충분할 것 같습니다.

hayeon-pak commented 2 years ago

빠른 답변 감사드립니다.

혹시 het_SNP_detection_somatic.sh 에서

samtools mpileup -t DP,AD,ADF,ADR,SP -q 10 -d 5000 -I -uf $reference $normal_bam $tumor_bam -r $l | bcftools call -c -v > vcf_somatic/$l.vcf

해당 code에는 $normal_bam 을 제외하고 분석을 진행하였습니다. 그 다음 process로

split($10,f,":"); split($11,g,":"); if(g[1] == "0/1"){ split($10,f1,":"); split(f1[7], tumor, ","); split($11,f2,":"); split(f2[7], normal, ","); print "'$i'""\t"$2"\t"$3"\t"$4"\t"$5"\t"normal[1]"\t"normal[2]"\t"tumor[1]"\t"tumor[2]

해당 과정을 진행해야 하는데, normal과 tumor 두 개의 data를 넣지 않아서 인지 $10은 존재하나 $11의 column 은 존재하지 않습니다. 이 부분과 관련하여, normal[1] 과 normal[2] 를 제외시키고 진행을 해야할 지, 아님 0으로 대체해야 할 지 질문 드립니다. 또한, [1]과 [2]는 각각 ADR과 AD 값을 이용하려는 것이 맞는지도 알고 싶습니다.

감사합니다.

dmcblab commented 2 years ago

네, 맞습니다. normal 없이 진행할 경우, $normal_bam을 제외하고 진행하면 됩니다. 그 다음 process에서 11번째 column이 없기 때문에 normal[1]과 normal[2]를 NA로 대채해 주시면 됩니다. 해당 스크립트를 참고하여 주세요.

#!/bin/bash

for i in `seq 1 23`; do
        if [ $i -eq 23 ];then
                i="X";
        fi
        cat $i.vcf | grep -v '#' | awk '{

                if(length($4)==1 && length($5) == 1 && $6>50){
                        split($10,f,":");
                        if(f[1] == "0/1"){
                                split($10,f1,":");
                                split(f1[7], tumor, ",");

                                print "'$i'""\t"$2"\t"$3"\t"$4"\t"$5"\t""NA""\t""NA""\t"tumor[1]"\t"tumor[2]
                        }
                }
        }' > $i.vcf.format.het

        cat $i.vcf | grep -v '#' | awk '{

                if(length($4)==1 && length($5) == 1 && $6>50){
                        split($10,f,":");
                        if(f[1] == "1/1"){
                                split($10,f1,":");
                                split(f1[7], tumor, ",");

                                print "'$i'""\t"$2"\t"$3"\t"$4"\t"$5"\t""NA""\t""NA""\t"tumor[1]"\t"tumor[2]
                        }
                }
        }' > $i.vcf.format.hom

        cat *.vcf.format.het > het_snps.format
        cat *.vcf.format.hom > hom_snps.format

done

[1]과 [2]는 allelic depth (AD)에서 reference allele depth와 alternative allele depth를 가르킵니다.

hayeon-pak commented 2 years ago

답변 감사합니다.

저는 Germline 분석으로 Karyotyping을 진행하고자 합니다.

그 과정에서 몇 가지 명확하지 않은 부분이 생겨 한 번 더 질문 드립니다.

samtools view -h -F 2 -b simulated_sorted.bam >simulated_sorted_NPE.bam samtools sort -n simulated_sorted_NPE.bam > simulated_sorted_NPE_sorted.bam bamToFastq -i simulated_sorted_NPE_sorted.bam -fq NPE.fq1 -fq2 NPE.fq2

혹시 해당 line은 simulated_sorted.bam file이 지정하는 data가 무엇인가요? 이후의 분석에서 optional 로 지정되어 있어 우선 진행시키지 않았습니다.

또한, Karyotyping 까지 진행시킨 tutorial 이 tutorial 3 이여서 해당 자료를 참고하여 진행하고 있습니다.

breakpoint graph construction (It takes a day during about 30 iterations)

breakpoint_graph -m total SVs cn_norm/ -i 16 -f 2000 -o total_job -t BRCA breakpoint_graph -m simplification -o total_job -i 16 -f 2000 total_job/SVs total_job/cn_norm/ -t BRCA

해당 과정에서 total과 simplification 두 번을 나누어 진행하셨는데, 이유를 알 수 있을까요?
그리고, 저의 경우에는 germline 으로 한 번만 진행시키면 되는 걸까요?
-i 와 -f 의 값은 무엇을 기준에 두고 진행시켜야 할까요?

번거로우심에도 매번 친절한 답변 주셔서 감사합니다 :)

dmcblab commented 2 years ago

안녕하세요,

simulated_sorted.bam 은 분석하고자 하는 샘플의 bam file을 지정해주시면 됩니다. 해당 경우에는 germline sample을 지정해 주면 되는데, 경우에 따라 false positive SV call이 잡히기 때문에 우선 건너 뛰는 것을 추천합니다.
total 과정은 존재하는 모든 SV을 포함하여 분석하고, 이후 simplification 과정을 통해 100kb 이내의 simple SVs를 제거하고 allele-specific 및 haplotype 분석을 진행합니다.
germline sample의 SV 분석이 목적이면 germline sample을 total 모드 (-m total)로 분석하면 됩니다. somatic sample 분석시에만 germline 모드 분석 후 somatic 모드를 진행합니다. complex SV, translocation 및 karyotyping 분석이 주 목적이면 total 모드 후에 simplification까지 진행하는 것이 좋습니다.
-i 와 -f 기준은 데이터 read depth 노이즈에 따라 선택해야 되는데, oversegmentation이 일어나지 않는 선에서 정하는 게 좋습니다. copy number breakpoint의 갯수가 SV 갯수의 2~3배 정도 되는 선이 좋지 않을까 생각합니다. 제가 total 분석 시에는 대개 -i 16 -f 2000이 잘 작동하는 것 같습니다.

hayeon-pak commented 2 years ago

정보 감사드립니다

현재 breakpoint_graph 과정을 진행 중에 있습니다. 이 과정에 오류가 생기고 있는데요.

제가 이용한 code는 breakpoint_graph -m total SVs dir/to/cn_norm/ -i 16 -f 2000 -o dir/to/total_job 입니다.

out_dir exists... overlapping Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 4383 did not have 14 elements Calls: read.table -> scan Execution halted Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 4383 did not have 14 elements Calls: read.table -> scan Execution halted Local segmentation... Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 4229 did not have 15 elements Calls: read.table -> scan Execution halted cat: copy_numbers: No such file or directory Ploidy and purity estimation... Loading required package: numDeriv Error in file(file, "rt") : cannot open the connection Calls: read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file 'copy_numbers': No such file or directory Execution halted Loading required package: numDeriv Error in file(file, "rt") : cannot open the connection Calls: read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file 'copy_numbers': No such file or directory Execution halted

현재 이와 같은 error가 발생하고 있습니다.

해당 line 4383을 확인해 보니 manta의 경우, SR 값이 존재하지 않는 call이 더러 있어서 해당 값이 존재하지 않아 빈 값으로 처리 되었습니다. 이 부분과 관련하여 SR 값이 존재하지 않는 경우 0 또는 NA로 처리해야 할까요?

또한, line 4229 did not have 15 elements 의 경우 15개의 elements가 존재하지 않는다고 나오는데, 15개의 element를 가지고 있어야 하는게 맞나요???

dmcblab commented 2 years ago

안녕하세요,

현재 preprocessing 코드가 somatic 기준이라 몇가지 에로사항이 생긴 것 같습니다. 우선 manta preprocess 코드를 첨부드립니다. 참고하여 주세요.

#!/bin/bash
PE_thres=3
zcat tumorSV.vcf.gz | awk -F "\t" '{
n=split($8,f,";");
split(f[2],DUPDEL,"=");
split(f[1],TRA, "=");
rn=split($10,read, ":");
if(rn==1){
        split(read[1],pr,",");
        PR=pr[2];
        SR=0;
}else{
        split(read[1],pr,",");
        PR=pr[2];
        split(read[2],sr,",");
        SR=sr[2];
}
if(DUPDEL[2]=="DUP"){
        split(f[1], pos2, "=");
        print "<DUP>\t"$1"\t"$2"\t"$1"\t"pos2[2]"\t5to3\t"PR"\t"SR
}else if(DUPDEL[2]=="DEL"){
        split(f[1], pos2, "=");
        print "<DEL>\t"$1"\t"$2"\t"$1"\t"pos2[2]"\t3to5\t"PR"\t"SR
}else if(DUPDEL[2]=="INV"){
        split(f[1], pos2, "=");
        split($8, INV, ";INV");
        split(INV[2], inv_ori, ";");
        print "<INV>\t"$1"\t"$2"\t"$1"\t"pos2[2]"\t"inv_ori[1]"to"inv_ori[1]"\t"PR"\t"SR;
}else if(TRA[2]=="BND"){
        n1=split($5, tra_ori, "[");
        if(n1>2){
                if(length(tra_ori[1])!=0)
                        ori="3to5"
                else if(length(tra_ori[3])!=0)
                        ori="5to5"
        }else{
        n2=split($5, tra_ori, "]");
                if(length(tra_ori[1])!=0)
                        ori="3to3"
                else if(length(tra_ori[3])!=0)
                        ori="5to3"
        }
        split(tra_ori[2], chr2_pos2, ":");
        print "<TRA>\t"$1"\t"$2"\t"chr2_pos2[1]"\t"chr2_pos2[2]"\t"ori"\t"PR"\t"SR;

}
}' | awk '{ if($7 > '$PE_thres' ){
        print $0"\t0\t0\t0\t0\t0\t0"}}' | awk '{if(($2~/^[1-2]?[0-9]$/ || $2=="X") && ($4~/^[1-2]?[0-9]$/ || $4=="X")) print $0}'    > tumorSV.vcf.format

SV 파일은 빈칸 없이 14개 element를 채우면 되는데 7번째 element 까지만 사용하기에 나머지는 NA로 채우셔도 됩니다.

hayeon-pak commented 2 years ago

답변 주신 사항 감사합니다.

Manta의 경우에도 저는 germline 분석을 진행하였는데요. Manta에서 germline인 분석을 진행하면,

GT:FT:GQ:PL:PR:SR 1/1:PASS:47:678,50,0:0,7:0,18

위와 같은 6개의 FORMAT 정보를 얻을 수 있었습니다. 이에 맞추어 해당 부분에 맞게 code를 수정하여 이용하는데, rn==5 로 수정해야 한다는 것을 놓치고 지나간 것 같습니다. PE_thres=3 부분도 추가하여 재 진행하였습니다.

code 제공 감사합니다 :)

현재, breakpoint_graph -m total SVs dir/to/cn_norm/ -i 16 -f 2000 -o dir/to/total_job 해당 과정을 진행중에 있습니다.

과정 중 error가 발생했는데,

There were 46 warnings (use warnings() to see them) Ploidy and purity estimation... Loading required package: numDeriv [1] "Capping 25 segs at tCR = 5.0" [1] "Removed 18 non-finite segments" [1] "Expected copy-ratio = 1.00034" [1] "MAF file: not found." ..................... ..................... ..................... ..................... ..................... ..................... ..................... ..................... .....................

[1] "1d mode opt: " [,1] [,2] [,3] [1,] 0 -2.201214297 -4053.648 [2,] 0 -2.401551261 -3445.425 [3,] 0 -1.613372679 -6085.837 [4,] 0 -2.401551261 -3445.425 [5,] 0 -1.390188964 -6067.732 [6,] 0 -2.635073812 -4038.653 [7,] 0 -1.102677549 -7509.578 [8,] 0 -2.401552427 -3445.425 [9,] 0 -0.697021830 -6903.150 [10,] 0 -0.697021328 -6903.150 [11,] 0 -2.083311682 -4372.206 [12,] 0 -2.401552426 -3445.425 [13,] 0 -2.567245481 -3135.374 [14,] 0 -0.003804297 -8805.454 [15,] 0 -0.003804799 -8805.454 [16,] 0 -0.003804297 -8805.454 [1] "54 unique modes found" [1] "41 modes in b / delta range." [1] "WARNING: NON-FINITE log_evidence"

우선, 로그 기록을 살펴보면 위와 같이 나오며,

MAF file: not found 에서 MAF file이 없어도 괜찮은건가요?
WARNING: NON-FINITE log_evidence -> warning 이여서 계속 진행하고 있지만, 문제가 없는지요?

Ploidy and purity estimation is done Integer programming... Error in if (abs(new$modal_cn[i] - new$raw_expected_cn[i]) > 1) { : missing value where TRUE/FALSE needed Execution halted cat: SVs.CN_opt.filtered: No such file or directory cp: cannot stat 'SVs.CN_opt.filtered': No such file or directory Local segmentation...

위의 WARNING과 함께 그 다음으로 error 가 발생하였습니다.

해당 error 부분 확인 해주시면 감사하겠습니다.
위의 이유로 SVs.CN_opt.filtered file이 생성되지 않은 듯 한데, 분석은 계속적으로 진행되고 있습니다. 이대로 진행시켜도 괜찮은 것인가요?

감사합니다!

dmcblab commented 2 years ago

안녕하세요,

네, 1~2 사항은 문제 없습니다.

3의 error는 문제가 있는데, 주로 SV input에서 SV가 중복되어 있는 경우 발생합니다. dir/to/total_job 위치에서 다음 코드가 잘 실행되었는지 확인해 보시기 바랍니다.

Rscript InfoGenomeR/breakpoint_graph/SV_break_adjustment.R
InfoGenomeR/breakpoint_graph/SV_truncated_wgs SVs cn_norm/

혹시 dir/to/total_job 을 압축해서 메일 (qlalf1457@gmail.com)로 보내주실 수 있을지요? 한번 확인해 보겠습니다.

dmcblab commented 2 years ago

안녕하세요,

ABSOLUTE 결과에서 chromosome X (23)에 대한 결과 값이 없어서 생긴 문제 같네요. 혹시 R version과 ABSOLUTE version이 어떻게 될지요?

hayeon-pak commented 2 years ago

안녕하세요 .

R version 4.1.2 ABSOLUTE 1.0.6 으로 확인됩니다.

감사합니다.

dmcblab commented 2 years ago

R version을 3.4.3으로 낮춰서 진행해 보시길 바랍니다. 보내주신 데이터로 R 3.4.3 버전에서 돌아가는 것을 확인했습니다. R 4.0 이상에서는 아직 테스트해보지 않았습니다.

hayeon-pak commented 2 years ago

안녕하세요.

R version을 낮추어 진행한 결과, breakpoint_garph -m total과 breakpoint_graph -m simplification이 수행되었습니다.

하지만, allele_graph -m total 진행시에

Can't locate Bio/DB/Fasta.pm in @INC (you may need to install the Bio::DB::Fasta module) (@INC contains: dir/to/miniconda3/envs/r343/lib/site_perl/5.26.2 dir/to/lib/site_perl/5.26.2/x86_64-linux-thread-multi) at dir/to/dmcblab/InfoGenomeR/allele_graph/fasta.pl line 6. BEGIN failed--compilation aborted at dir/to/dmcblab/InfoGenomeR/allele_graph/fasta.pl line 6.

위와 같은 error가 발생하였습니다.

확인해 주시면 감사하겠습니다.

dmcblab commented 2 years ago

네, 안녕하세요.

피드백 감사합니다. Requirement에서 빠져있네요.

perl 에서 해당 모듈을 설치해주시기 바랍니다.

sudo perl -MCPAN -e 'install Bio::DB::Fasta'

또한, haploytpe 및 karyotyping 진행하기 전에, 해당 파일들을 컴파일 해주시기 바랍니다.

cd dir/to/dmcblab/InfoGenomeR
make

추가로, karyotype 분석 시 tutorial 3와 같이 -m simplification 과정도 한번 진행해보시기 바랍니다.

후에 graph_plot.R을 실행할 때에는 SV 목록에서 BND로 되어 있는 부분을 TRA로 바꿀 필요가 있습니다. BND도 호환되도록 수정을 해놓아야 겠네요.

언제든지 문의주시길 바랍니다

hayeon-pak commented 2 years ago

답변 감사합니다.

Haplotype 및 karyotyping 진행 전, 해당 file들을 컴파일 하는 것을 제안해 주셔서 진행하는 중에 에러가 발생하였습니다.

아래는 전체 error 관련 log 입니다.

g++ --std=c++11 -c -o haplotype_graph/phasing/dag_N.o haplotype_graph/phasing/dag_N.cpp haplotype_graph/phasing/dag_N.cpp: In function ‘int main(int, char*)’: haplotype_graph/phasing/dag_N.cpp:248:70: error: ‘log’ was not declared in this scope double val = prev_viterbi_m [(int)(lit)[0]][(int)(ljt)[0]] + log((lit)[2])+log((ljt)[2]); ^~~ haplotype_graph/phasing/dag_N.cpp:248:70: note: suggested alternative: ‘long’ double val = prev_viterbi_m [(int)(lit)[0]][(int)(ljt)[0]] + log((lit)[2])+log((ljt)[2]); ^~~ long haplotype_graph/phasing/dag_N.cpp:304:78: error: ‘log’ was not declared in this scope double val = prev_viterbi_m [(int)(lit)[0]][(int)(ljt)[0]] + log((lit)[2])+log((ljt)[2]); ^~~ haplotype_graph/phasing/dag_N.cpp:304:78: note: suggested alternative: ‘long’ double val = prev_viterbi_m [(int)(lit)[0]][(int)(ljt)[0]] + log((lit)[2])+log((ljt)[2]); ^~~ long haplotype_graph/phasing/dag_N.cpp:358:78: error: ‘log’ was not declared in this scope double val = prev_viterbi_m [(int)(ljt)[0]][(int)(lit)[0]] + log((lit)[2])+log((ljt)[2]); ^~~ haplotype_graph/phasing/dag_N.cpp:358:78: note: suggested alternative: ‘long’ double val = prev_viterbi_m [(int)(ljt)[0]][(int)(lit)[0]] + log((lit)[2])+log((ljt)[2]); ^~~ long haplotype_graph/phasing/dag_N.cpp:415:125: error: ‘log’ was not declared in this scope double val = prev_viterbi_m [(int)(lit)[0]][(int)(ljt)[0]] + log((lit)[2])+log((ljt)[2]); ^~~ haplotype_graph/phasing/dag_N.cpp:415:125: note: suggested alternative: ‘long’ double val = prev_viterbi_m [(int)(lit)[0]][(int)(ljt)[0]] + log((lit)[2])+log((ljt)[2]); ^~~ long In file included from /usr/include/x86_64-linux-gnu/c++/7/bits/c++allocator.h:33:0, from /usr/include/c++/7/bits/allocator.h:46, from /usr/include/c++/7/string:41, from /usr/include/c++/7/bits/locale_classes.h:40, from /usr/include/c++/7/bits/ios_base.h:41, from /usr/include/c++/7/ios:42, from /usr/include/c++/7/ostream:38, from /usr/include/c++/7/iostream:39, from haplotype_graph/phasing/dag_N.cpp:1: /usr/include/c++/7/ext/new_allocator.h: In instantiation of ‘void __gnu_cxx::new_allocator<_Tp>::destroy(_Up) [with _Up = std::cxx11::list<std::vector > [4]; _Tp = std::_List_node<std::cxx11::list<std::vector > [4]>]’: /usr/include/c++/7/bits/alloc_traits.h:487:4: required from ‘static void std::allocator_traits<std::allocator<_CharT> >::destroy(std::allocator_traits<std::allocator<_CharT> >::allocator_type&, _Up*) [with _Up = std::cxx11::list<std::vector > [4]; _Tp = std::_List_node<std::cxx11::list<std::vector > [4]>; std::allocator_traits<std::allocator<_CharT> >::allocator_type = std::allocator<std::_List_node<std::cxx11::list<std::vector > [4]> >]’ /usr/include/c++/7/bits/list.tcc:76:31: required from ‘void std::cxx11::_List_base<_Tp, _Alloc>::_M_clear() [with _Tp = std::cxx11::list<std::vector > [4]; _Alloc = std::allocator<std::__cxx11::list<std::vector > [4]>]’ /usr/include/c++/7/bits/stl_list.h:442:17: required from ‘std::cxx11::_List_base<_Tp, _Alloc>::~_List_base() [with _Tp = std::cxx11::list<std::vector > [4]; _Alloc = std::allocator<std::cxx11::list<std::vector > [4]>]’ /usr/include/c++/7/bits/stl_list.h:733:7: required from here /usr/include/c++/7/ext/new_allocator.h:140:28: error: request for member ‘~std::cxx11::list<std::vector > [4]’ in ‘ p’, which is of non-class type ‘std::cxx11::list<std::vector > [4]’ destroy(_Up p) { __p->~_Up(); }


/usr/include/c++/7/ext/new_allocator.h: In instantiation of ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up = std::__cxx11::list<std::vector<double> > [4]; _Args = {const std::__cxx11::list<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > (&)[4]}; _Tp = std::_List_node<std::__cxx11::list<std::vector<double> > [4]>]’:
/usr/include/c++/7/bits/alloc_traits.h:475:4:   required from ‘static void std::allocator_traits<std::allocator<_CharT> >::construct(std::allocator_traits<std::allocator<_CharT> >::allocator_type&, _Up*, _Args&& ...) [with _Up = std::__cxx11::list<std::vector<double> > [4]; _Args = {const std::__cxx11::list<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > (&)[4]}; _Tp = std::_List_node<std::__cxx11::list<std::vector<double> > [4]>; std::allocator_traits<std::allocator<_CharT> >::allocator_type = std::allocator<std::_List_node<std::__cxx11::list<std::vector<double> > [4]> >]’
/usr/include/c++/7/bits/stl_list.h:575:33:   required from ‘std::__cxx11::list<_Tp, _Alloc>::_Node* std::__cxx11::list<_Tp, _Alloc>::_M_create_node(_Args&& ...) [with _Args = {const std::__cxx11::list<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > (&)[4]}; _Tp = std::__cxx11::list<std::vector<double> > [4]; _Alloc = std::allocator<std::__cxx11::list<std::vector<double> > [4]>; std::__cxx11::list<_Tp, _Alloc>::_Node = std::_List_node<std::__cxx11::list<std::vector<double> > [4]>]’
/usr/include/c++/7/bits/stl_list.h:1801:32:   required from ‘void std::__cxx11::list<_Tp, _Alloc>::_M_insert(std::__cxx11::list<_Tp, _Alloc>::iterator, _Args&& ...) [with _Args = {const std::__cxx11::list<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > (&)[4]}; _Tp = std::__cxx11::list<std::vector<double> > [4]; _Alloc = std::allocator<std::__cxx11::list<std::vector<double> > [4]>; std::__cxx11::list<_Tp, _Alloc>::iterator = std::_List_iterator<std::__cxx11::list<std::vector<double> > [4]>]’
/usr/include/c++/7/bits/stl_list.h:1118:9:   required from ‘void std::__cxx11::list<_Tp, _Alloc>::push_back(const value_type&) [with _Tp = std::__cxx11::list<std::vector<double> > [4]; _Alloc = std::allocator<std::__cxx11::list<std::vector<double> > [4]>; std::__cxx11::list<_Tp, _Alloc>::value_type = std::__cxx11::list<std::vector<double> > [4]]’
haplotype_graph/phasing/dag_N.cpp:75:24:   required from here
/usr/include/c++/7/ext/new_allocator.h:136:4: error: parenthesized initializer in array new [-fpermissive]
{ ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/7/ext/new_allocator.h:136:4: error: no matching function for call to ‘std::__cxx11::list<std::vector<double> >::list(const std::__cxx11::list<std::vector<double> > [4])’
In file included from /usr/include/c++/7/list:63:0,
from haplotype_graph/phasing/dag_N.cpp:4:
/usr/include/c++/7/bits/stl_list.h:709:2: note: candidate: template<class _InputIterator, class> std::__cxx11::list<_Tp, _Alloc>::list(_InputIterator, _InputIterator, const allocator_type&)
list(_InputIterator __first, _InputIterator __last,
^~~~
/usr/include/c++/7/bits/stl_list.h:709:2: note:   template argument deduction/substitution failed:
In file included from /usr/include/x86_64-linux-gnu/c++/7/bits/c++allocator.h:33:0,
from /usr/include/c++/7/bits/allocator.h:46,
from /usr/include/c++/7/string:41,
from /usr/include/c++/7/bits/locale_classes.h:40,
from /usr/include/c++/7/bits/ios_base.h:41,
from /usr/include/c++/7/ios:42,
from /usr/include/c++/7/ostream:38,
from /usr/include/c++/7/iostream:39,
from haplotype_graph/phasing/dag_N.cpp:1:
/usr/include/c++/7/ext/new_allocator.h:136:4: note:   candidate expects 3 arguments, 1 provided
{ ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/7/list:63:0,
from haplotype_graph/phasing/dag_N.cpp:4:
/usr/include/c++/7/bits/stl_list.h:685:7: note: candidate: std::__cxx11::list<_Tp, _Alloc>::list(std::__cxx11::list<_Tp, _Alloc>&&, const allocator_type&) [with _Tp = std::vector<double>; _Alloc = std::allocator<std::vector<double> >; std::__cxx11::list<_Tp, _Alloc>::allocator_type = std::allocator<std::vector<double> >]
list(list&& __x, const allocator_type& __a)
^~~~
/usr/include/c++/7/bits/stl_list.h:685:7: note:   candidate expects 2 arguments, 1 provided
/usr/include/c++/7/bits/stl_list.h:681:7: note: candidate: std::__cxx11::list<_Tp, _Alloc>::list(const std::__cxx11::list<_Tp, _Alloc>&, const allocator_type&) [with _Tp = std::vector<double>; _Alloc = std::allocator<std::vector<double> >; std::__cxx11::list<_Tp, _Alloc>::allocator_type = std::allocator<std::vector<double> >]
list(const list& __x, const allocator_type& __a)
^~~~
/usr/include/c++/7/bits/stl_list.h:681:7: note:   candidate expects 2 arguments, 1 provided
/usr/include/c++/7/bits/stl_list.h:676:7: note: candidate: std::__cxx11::list<_Tp, _Alloc>::list(std::initializer_list<_Tp>, const allocator_type&) [with _Tp = std::vector<double>; _Alloc = std::allocator<std::vector<double> >; std::__cxx11::list<_Tp, _Alloc>::allocator_type = std::allocator<std::vector<double> >]
list(initializer_list<value_type> __l,
^~~~
/usr/include/c++/7/bits/stl_list.h:676:7: note:   no known conversion for argument 1 from ‘const std::__cxx11::list<std::vector<double> > [4]’ to ‘std::initializer_list<std::vector<double> >’
/usr/include/c++/7/bits/stl_list.h:665:7: note: candidate: std::__cxx11::list<_Tp, _Alloc>::list(std::__cxx11::list<_Tp, _Alloc>&&) [with _Tp = std::vector<double>; _Alloc = std::allocator<std::vector<double> >]
list(list&& __x) noexcept
^~~~
/usr/include/c++/7/bits/stl_list.h:665:7: note:   no known conversion for argument 1 from ‘const std::__cxx11::list<std::vector<double> > [4]’ to ‘std::__cxx11::list<std::vector<double> >&&’
/usr/include/c++/7/bits/stl_list.h:652:7: note: candidate: std::__cxx11::list<_Tp, _Alloc>::list(const std::__cxx11::list<_Tp, _Alloc>&) [with _Tp = std::vector<double>; _Alloc = std::allocator<std::vector<double> >]
list(const list& __x)
^~~~
/usr/include/c++/7/bits/stl_list.h:652:7: note:   no known conversion for argument 1 from ‘const std::__cxx11::list<std::vector<double> > [4]’ to ‘const std::__cxx11::list<std::vector<double> >&’
/usr/include/c++/7/bits/stl_list.h:625:7: note: candidate: std::__cxx11::list<_Tp, _Alloc>::list(std::__cxx11::list<_Tp, _Alloc>::size_type, const value_type&, const allocator_type&) [with _Tp = std::vector<double>; _Alloc = std::allocator<std::vector<double> >; std::__cxx11::list<_Tp, _Alloc>::size_type = long unsigned int; std::__cxx11::list<_Tp, _Alloc>::value_type = std::vector<double>; std::__cxx11::list<_Tp, _Alloc>::allocator_type = std::allocator<std::vector<double> >]
list(size_type __n, const value_type& __value,
^~~~
/usr/include/c++/7/bits/stl_list.h:625:7: note:   candidate expects 3 arguments, 1 provided
/usr/include/c++/7/bits/stl_list.h:613:7: note: candidate: std::__cxx11::list<_Tp, _Alloc>::list(std::__cxx11::list<_Tp, _Alloc>::size_type, const allocator_type&) [with _Tp = std::vector<double>; _Alloc = std::allocator<std::vector<double> >; std::__cxx11::list<_Tp, _Alloc>::size_type = long unsigned int; std::__cxx11::list<_Tp, _Alloc>::allocator_type = std::allocator<std::vector<double> >] <near match>
list(size_type __n, const allocator_type& __a = allocator_type())
^~~~
/usr/include/c++/7/bits/stl_list.h:613:7: note:   conversion of argument 1 would be ill-formed:
In file included from /usr/include/x86_64-linux-gnu/c++/7/bits/c++allocator.h:33:0,
from /usr/include/c++/7/bits/allocator.h:46,
from /usr/include/c++/7/string:41,
from /usr/include/c++/7/bits/locale_classes.h:40,
from /usr/include/c++/7/bits/ios_base.h:41,
from /usr/include/c++/7/ios:42,
from /usr/include/c++/7/ostream:38,
from /usr/include/c++/7/iostream:39,
from haplotype_graph/phasing/dag_N.cpp:1:
/usr/include/c++/7/ext/new_allocator.h:136:4: error: invalid conversion from ‘const std::__cxx11::list<std::vector<double> >*’ to ‘std::__cxx11::list<std::vector<double> >::size_type {aka long unsigned int}’ [-fpermissive]
{ ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/7/list:63:0,
from haplotype_graph/phasing/dag_N.cpp:4:
/usr/include/c++/7/bits/stl_list.h:600:7: note: candidate: std::__cxx11::list<_Tp, _Alloc>::list(const allocator_type&) [with _Tp = std::vector<double>; _Alloc = std::allocator<std::vector<double> >; std::__cxx11::list<_Tp, _Alloc>::allocator_type = std::allocator<std::vector<double> >]
list(const allocator_type& __a) _GLIBCXX_NOEXCEPT
^~~~
/usr/include/c++/7/bits/stl_list.h:600:7: note:   no known conversion for argument 1 from ‘const std::__cxx11::list<std::vector<double> > [4]’ to ‘const allocator_type& {aka const std::allocator<std::vector<double> >&}’
/usr/include/c++/7/bits/stl_list.h:589:7: note: candidate: std::__cxx11::list<_Tp, _Alloc>::list() [with _Tp = std::vector<double>; _Alloc = std::allocator<std::vector<double> >]
list()
^~~~
/usr/include/c++/7/bits/stl_list.h:589:7: note:   candidate expects 0 arguments, 1 provided
<builtin>: recipe for target 'haplotype_graph/phasing/dag_N.o' failed
make: *** [haplotype_graph/phasing/dag_N.o] Error 1

이와 같은 error가 발생하였는데요.

현재 먼저 haplotype_graph 과정은 이미 돌리는 중이였던 상태라 계속 진행 중에 있으며, 컴파일 문제가 해결되면 중단 후 재 진행할 예정입니다.

Error 관련하여 확인해 주시면 감사하겠습니다!

dmcblab commented 2 years ago

안녕하세요,

아마 gcc과 g++ 버전 문제 인 것 같습니다. gcc --version과 g++ --version 을 체크해 보시고, 4.8로 낮춰보시길 바랍니다.

hayeon-pak commented 2 years ago

안녕하세요 :)

컴파일 후, graph_plot 까지 모두 결과를 확인하였습니다. 도움 감사합니다.

저희 결과를 확인하였을 때, 생각보다 false positive call에 의한 chromosome의 변형이 많은 것 같아서요. 혹시 input으로 넣는 raw data에 filtering 을 더 적용시킬 사항이 있을까요??

감사합니다.

dmcblab commented 2 years ago

안녕하세요.

네, 맞습니다. control sample이 없이 total mode를 진행했을 때 좀 어려운 부분이 있습니다.

초기 input 상에서 paired-end / split reads 나 mapping quality에 따라 좀 더 필터링 할 수 있는데, 이 기준에 따라서는 필터링 되는 true positive도 많아서 조심해야 됩니다. 각 툴들 마다 다른 기준에 따른 초기 SV calling을 조절해 볼 여지는 있는 것 같습니다.

false positive contol을 위해서 copy number evidence 가 없는 SV들을 strict 하게 제거하는 breakpoint_graph --stringent 옵션을 -mode simplification 와 같이 한번 사용해 볼 수 있습니다.

hayeon-pak commented 2 years ago

의견 감사드립니다.

혹시 한 가지 더 여쭤보고 싶은 것은 SV 분석에 lumpy tool도 많이 이용이 되고 있는 것으로 보이는데, 분석 format에 맞추어 lumpy 결과를 추가한다면, 결과에 있어 도움이 될까요?

dmcblab commented 2 years ago

네, lumpy나 최근 SvABA도 성능이 좋은 걸로 압니다. 여러 툴들을 참고하여 초기 input을 조정하면 좋습니다. 후에 해당 툴들도 테스트 해보고 지원하고자 하는데, 포맷만 잘 맞춘다면 도움이 될 것 같습니다. SV 방향 (3to3, 5to5, 3to5, 5to3)만 좀 조심하면 됩니다.

hayeon-pak commented 2 years ago

답변 감사드립니다.

의견 주셨던 대로 breakpoint_graph --stringent 옵션을 mode -simplification 과 함께 분석을 진행해 보았는데요.

allele_graph -m total -g /dir/to/GRCh37/GRCh37 -o total_job -t 12 hom_snps.format het_snps.format allele_graph 과정에서

$message [1] "ERROR: ABNORMAL_TERMINATION_IN_LNSRCH"

위의 Error가 발생하였습니다. 이전 분석에서는 해당 오류가 없었는데, 확인해 주시면 감사하겠습니다.

dmcblab commented 2 years ago

안녕하세요.

해당 폴더를 압축해서 메일로 보내주시길 바랍니다. 한번 확인해보고 답변 드리겠습니다.

dmcblab commented 2 years ago

안녕하세요, 해당 ERROR는 ASCN 측정 시 특정 ASCN에서 negative binomial model fitting이 안되는 문제인데, 우선 무시하셔도 됩니다. Ploidy가 잘못 잡혔거나 Purity가 1인 경우 생길 수 있습니다.

제가 해당 work를 보니까, normal 분석 시에는 ploidy를 1.5~2,5 사이로 제한하는게 좋겠습니다.

또한 cancer cell 이 아닌 일종의 normal cell로 complex SVs가 거의 없는 것 같아 100kb 이하의 simple SV 위주의 분석이 될 것 같습니다. 만약 normal cell line 분석 시 100kb simplification (default) 과정을 적용하면 대부분 변이가 사라지기 때문에, simplification 과정에서 사이즈를 1kb 혹은 10kb로 낮춰야겠습니다. 해당 옵션을 업데이트 해놨습니다. (-e, --simple_length).

새로 update된 코드를 받아보시고, 컴파일 및 humandb 파일 셋팅 완료 후 다음 command로 실행해보시기 바랍니다.

breakpoint_graph -m total SVs cn_norm/ -i 16 -f 2000 -n 1.5 -x 2.5 -o total_job_simple --stringent T
breakpoint_graph -m simplification -o total_job_simple -i 16 -f 2000 -n 1.5 -x 2.5 total_job_simple/SVs total_job_simple/cn_norm/ --stringent T -e 10000
allele_graph -m total -g /dir/to/GRCh37/GRCh37 -o total_job_simple -t 23 hom_snps.format het_snps.format
haplotype_graph -o total_job_simple -t 6
karyotyping -o total_job_simple/

분석 후에 karyotyping 진행 할 수 있습니다.

cd total_job_simple/InfoGenomeR_output/karyotypes/euler.3/
Rscript  dir/to/InfoGenomeR/etc/chromosomes_graph.R tag

hayeon-pak commented 2 years ago

안녕하세요 :)

수정해주신 code 정말 감사드립니다. update하여 결과 확인하였습니다. 말씀 주신대로 일종의 normal cell을 이용하였고, update된 code로 분석한 결과 이전보다 좋은 결과를 얻을 수 있었습니다.

결과와 관련하여 몇 가지 확인하고 싶은 사항이 있어 문의 드립니다.

몇몇 chromosome의 경우 이와 같이 뒤에 점이 있는 것을 확인할 수 있었습니다. 이 부분과 관련하여 illustration으로 해당 부분을 제거해 보니 pter 부분이 뒤로 가 있는 것을 확인할 수 있었습니다. 몇개의 chromosome에서 이와 같이 상단에 pter가 존재하지 않고 그림 뒤에 존재해 있는 것을 확인하였는데요. 문제가 있는 것일까요?
preprocessing의 delly_somatic과 관련한 사항입니다. 결과 확인 도중 원래의 delly vcf file에서 해당 call의 경우에 아래와 같은 결과를 얻었습니다.

chr22 45991519 BND00005396 A [chr21:11022886[A 415 PASS IMPRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.7;END=45991520;CHR2=chr21;POS2=11022886;PE=13;MAPQ=31;CT=5to5;CIPOS=-703,703;CIEND=-703,703 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-29.6095,0,-355.939:10000:PASS:15542:46406:30864:2:65:23:0:0

delly_somatic code를 통해 해당 부분을 parsing하였을 때,

22 45991519 21 45991520 5to5 13 0 31 13 0 0 1 1

이와 같이 수정되었는데요. manta code의 경우에는 BND(혹은 TRA)에서 pos2에 [chr21:11022886[A 이 부분을 이용하였는데, delly에서는 해당 부분이 아니라 pos2에 INFO의 END를 이용하였습니다. 이 부분과 관련하여 manta와 같이 수정이 되어야 할까요?

확인해주시면 감사하겠습니다 :D

dmcblab commented 2 years ago

안녕하세요,

pter을 표시하는데 문제가 좀 있는 것 같네요. karyotyping graph를 업데이트 해봐야겠습니다. 해당 부분 짚어주셔서 감사합니다.
중요한 포인트네요. DELLY v0.8.7 버전에서는 TRA 표기 방식이 이전 버전과 달라서, 확인이 필요한 것 같습니다.
아마 manta와 같이 [chr21:11022886[A 부분을 사용하는 게 맞는 것 같습니다.

현재는 DELLY v.0.7.6 버전을 사용하는게 가장 안전합니다.

delly vcf와 manta vcf 파일을 메일로 보내주시면 검토 후 답변 드리겠습니다.

dmcblab commented 2 years ago

안녕하세요,

pter 표시하는 코드가 업데이트 되었습니다. 현재 코드로 업데이트를 한번 해주세요. 참고로, chromosome 13,14,15,21,22의 경우 p-arm 부분의 reference sequence가 대부분 'N' 이기 때문에, mapping 되는 read가 없습니다. copy number bin과 SV breakpoint가 해당 염색체 p-arm에서는 비어 있는 게 맞습니다. 다만, visualization 시에 이 부분을 정상이라고 가정하고 우선 표시해 두는게 좋은 것 같습니다. 해당 코드도 업데이트 되어 있습니다.
제가 DELLY v0.8.7을 사용해보니 BND 표기 방식에서는 "END=" 대신에 "POS2="를 사용하는 게 맞습니다. 나머지 SV는 "END="를 사용하시면 됩니다. 그리고 BND를 TRA로 바꿔주시기 바랍니다.
BND SV를 맞게 처리 한 후, --stringent F 옵션도 한번 고려해보시기 바랍니다.

hayeon-pak commented 2 years ago

안녕하세요.

넵, 위에서 언급 주셔서 BND는 TRA로 수정하여 진행하고 있습니다. 다시 한번 상기 시켜 주신 점 감사합니다. 제안 주신대로 --stringent F 옵션으로도 한 번 진행 시켜 본 뒤, 결과 확인하여 판단해 보겠습니다 :)

또한, 수정해 주신 code 이용하여 진행한 결과, 13, 14, 15, 21, 22의 p-arm 부분이 정상적으로 그려지는 것을 확인하였습니다. 그런데, 저의 data의 경우에 chromosome14와 15 사이에 몇 개의 SV가 존재하고 있는데요.

DUP 14 106471389 14 106774088 5to3 5 0 30 5 0 0 1 1 DEL 15 24672249 15 24722822 3to5 6 0 42 6 0 0 1 1 TRA 14 106771955 15 22491420 5to3 4 19 0 0 0 0 0 0 SEG 14 106473442 15 22486810 3to5 NA NA NA NA NA NA NA NA

위 4개의 SV가 존재하고 있습니다. SEG의 origin call은 아래와 같습니다.

TRA 15 22486810 14 106484226 5to3 11 0 60 11 0 0 1 1
DEL 14 106473442 14 106483618 3to5 9 0 35 9 0 0 1 1

TRA와 SEG 때문인지 chromosome 15번에 karyotype을 visualization 할 때 아래와 같은 오류가 생기는 듯 합니다.

확인해 주시면 감사하겠습니다.

dmcblab commented 2 years ago

아 네, 감사합니다.

해당 TRA와 SEG를 염색체 14번으로의 insertion으로 해석하는 것 같네요.

다만 염색체 15 centromere쪽에 400kb 정도의 노이즈가 있는데, 원래 chromosome_graph 코드에서는 이 노이즈를 plot하지 않습니다. 업데이트 된 코드에서 15p를 이어서 그리다 보니 노이즈까지 plot이 된 것 같네요. 해당 코드는 수정되었습니다.

hayeon-pak commented 2 years ago

안녕하세요 :)

수정해주신 code를 이용하여 분석을 잘 진행하였습니다. 정말 감사드립니다.

결과와 관련하여 질문 사항이 있습니다. 첫 input으로 넣은 SVs file의 원본 vcf에 annotation하여 blacklist등의 추가적으로 몇 가지 filteration을 진행하여 추가적인 분석을 진행해 보았는데요. 이 두번째 input의 경우, 처음의 input에서 filtering만 진행된 것이기 때문에, 아예 다른 input이 아니라 첫 번째와 같은 data에 개수만 줄어든 input입니다.

그래서 예상으로는, 다른 결과가 아니라 첫 번째보다 SV의 수가 줄어들기만 할 것이라고 예상했는데요. DEL, DUP, INV의 경우에는 개수만 줄어들었으나, TRA에서 두 분석의 결과가 전혀 다르게 나온 것을 확인하였습니다.

첫 번째 input 결과 중, translocation SV

TRA 14 106771955 15 22491420 5to3 4 19 0 0 0 0 0 0 SEG 14 106473442 15 22486810 3to5 NA NA NA NA NA NA NA NA

두 번째 input 결과 중, translocation SV

TRA 6 382462 16 33428531 3to3 20 0 33 20 0 0 1 1 TRA 16 32121077 21 9477556 5to3 5 0 37 5 0 0 1 1

제가 code를 보고 생각했을 때, 해당 SV의 position에 해당하는 cn data 만을 이용하기 때문에 주변의 다른 call(variant)의 영향을 받지 않을 것이라 생각했기 때문에, 첫 번째의 결과에서 수만 줄어들 것이라 생각했는데, 전혀 다른 결과가 나온 이유가 있을까요? 어떤 부분에서 영향을 받게 되는 건가요..?

답변 주시면 감사하겠습니다!

dmcblab commented 2 years ago

안녕하세요,

SV related set 내에서 integer programming을 진행하기 때문에, 다른 SV call 영향을 받습니다.
그리고 SV related set이 아니더라도, SV breakpoint에 따라 SV position에 해당하는 cn segment의 길이가 달라져서 영향을 받습니다.
purity & ploidy 측정도 조금 달라질 수도 있습니다.
simplification 모드도 주변 SV의 영향을 받습니다.

centromere 및 telomere 주변 등 mappability가 낮은 영역에서 주변 false SV call도 많고 cn data 확실성도 부족하기 때문에 (특히 control이 없는 경우), 분석 상황에 따라 이렇게 call이 달라질 수는 있습니다.

hayeon-pak commented 2 years ago

안녕하세요 :)

말씀주신 방법들을 이용하여 다양한 조건으로 여러 분석을 진행해 보았습니다. 그런데, 결과 관련하여 의문점이 생겨 다시 한 번 문의 드립니다.

총 3개의 sample을 InfoGenomeR를 통해 분석을 진행하였는데요.

이는 모두 마지막으로 남겨진 SV 이고, 3개 모두 혹은 2개의 sample에서 거의 유사하거나 완전히 일치하는 structural variant를 탐지하였습니다. 해당 variant는 common sv나 혹은 false positive로 봐야 할 까요?

이 결과를 어떻게 해석해야 할까요...

답변 주시면 감사하겠습니다!

dmcblab commented 2 years ago

안녕하세요,

Reference genome도 오류가 있기 때문에, 실제 human genome과 reference genome과의 차이를 반영했거나 recurrent한 germline SV라고 볼 수도 있을 것 같습니다 mapping 과정에서 생긴 false positive 가능성도 있는 것 같습니다.

SV 데이터베이스를 통해서 blacklist를 만들어서 제거하거나 germline SV를 annotation 하는 방법이 있습니다.

예시로 DGV 데이터베이스에서 17:15043902-15058719 (deletion) 을 찾아보면 (http://dgv.tcag.ca/gb2/gbrowse/dgv2_hg19/?name=17%3A15043902-15058719;search=Search) 실제 동일 영역에서 loss가 일어난 걸 볼 수 있는데, germline SV라고 해석할 수 있을 것 같습니다.

추가로 해당 링크에서 보면 Asian-specific SV인 것 같네요. http://dgv.tcag.ca/gb2/gbrowse_details/dgv2_hg19?ref=chr17;start=15043381;end=15059150;name=;class=Sequence;feature_id=3940;db_id=chrm_imb_map_str%3Adatabase

최근 human genome structural variation consortium 에서도 germline SV가 정리가 잘 된 걸로 아는데 참고해보시면 좋겠습니다.

hayeon-pak commented 2 years ago

안녕하세요!!

좋은 정보 주셔서 감사합니다. 해당 자료 참고하여 재진행해 보도록 하겠습니다.

새로운 sample을 이용하여 분석을 진행해 보고 있는데,

Error in if (abs(new$modal_cn[i] - new$raw_expected_cn[i]) > 1) { : missing value where TRUE/FALSE needed Execution halted cat: SVs.CN_opt.filtered: No such file or directory cp: cannot stat 'SVs.CN_opt.filtered': No such file or directory Local segmentation...

위와 같은 Error가 발생하며 계속 진행은 되는데 iter가 70이상까지 넘어가고 있어 중단시킨상태입니다.

확인해 주시면 감사하겠습니다....!

dmcblab commented 2 years ago

안녕하세요,

혹시 R 버전을 낮춘 상태에서도 해당 오류가 발생하는지요? 해당 폴더를 압축해서 메일로 보내주시면 한번 확인해 보겠습니다.

hayeon-pak commented 2 years ago

확인해보니 제가 설정 변경을 하다 R version 또한 변경이 되었던 것 같습니다.

번거롭게 같은 문제로 문의를 드렸네요 ㅠㅠ.. 다시 변경하였고 문제 없음 확인하였습니다.

감사합니다!!

dmcblab / InfoGenomeR

bicseq_preprocess.sh #5

breakpoint graph construction (It takes a day during about 30 iterations)