PapenfussLab / StructuralVariantAnnotation

R package designed to simplify structural variant analysis
GNU General Public License v3.0
68 stars 15 forks source link

annotate the output of GRIDSS with SVLEN and SVTYPE #19

Closed Fred-07 closed 5 years ago

Fred-07 commented 5 years ago

Hello. I started from a .bam file using the script gridss_separate.sh to perform GRIDSS analysis. I wanted to annotate the SV type in the .vcf file using the commands proposed by D. Cameron.

vcf.file <- "C:/Scratch/NA_12878_duplicates_removed.sv.vcf" vcf <- VariantAnnotation::readVcf(vcf.file, "hg19") gr <- breakpointRanges(vcf)

Warning message: In .breakpointRanges(x, ...) Removing 13731 unpaired breakend variants gridss0bf_5o, gridss0ff_14o, gridss0bb_15o, gridss0bb_17o, gridss0fb_141o, gridss0bb_20o, gridss0ff_23o, gridss0bb_27o, gridss0fb_397o, gridss0bf_145h, gridss0ff_228o, gridss0bf_248o, gridss0bf_314o, gridss0fb_1097o, gridss0bf_382o, gridss0bb_334o, gridss0bf_504o, gridss0ff_497o, gridss0fb_1763o, gridss0ff_608o, gridss0bb_785o, gridss0bf_937o, gridss0bb_797o, gridss0fb_2629o, gridss0bb_820o, gridss0fb_2665o, gridss0bb_842o, gridss0bf_1021o, gridss0bf_1094o, gridss0bb_986o, gridss0bb_1070o, gridss0fb_3431o, gridss0bb_1105o, gridss0bf_1398o, gridss0bf_1422o, gridss0bb_1294o, gridss0fb_4020o, gridss0fb_4068o, gridss0bb_1344o, gridss0bf_1634o, gridss0bf_1638o, gridss0fb_4154o, gridss0fb_4157o, gridss0bf_1641o, gridss0fb_4159o, gridss0fb_4160o, gridss0bf_1643o, gridss0fb_4162o, gridss0bf_1644o, gridss0fb_4167o, gridss0bf_1645o, gridss0fb_4175o, gridss0bf_1648o, gridss0bf_1650o, gridss0fb_4177o, gridss0fb_4178o, gridss0bf_1652o, gridss0fb_4180o, grid [... truncated]

svtype <- simpleEventType(gr) info(vcf)$SIMPLE_TYPE <- NA_character_

Warning message: info fields with no header: SIMPLE_TYPE

info(vcf[gr$vcfId])$SIMPLE_TYPE <- svtype

Error in [[<-(*tmp*, name, value = c("DUP", "INS", "INS", "INS", "INS", : 185200 elements in value to replace 0 elements

info(vcf)$SIMPLE_TYPE <- svtype

Error in [[<-(*tmp*, name, value = c("DUP", "INS", "INS", "INS", "INS", : 185200 elements in value to replace 425839 elements

It is unclear where the gr$vcfId comes from. Does it appear when multiple .bam files are treated in the same analysis? However, after removing the reference to gr$vcfId, there is is another problem.

I tried with the demo data as described below. The problem with gr$vcfId is found with the demo data too.

vcf.file2 <- system.file("extdata", "gridss.vcf", package = "StructuralVariantAnnotation")

vcf2 <- VariantAnnotation::readVcf(vcf.file2, "hg19")

gr2 <- breakpointRanges(vcf2)

Warning message: In .breakpointRanges(x, ...) : Removing 4 unpaired breakend variants gridss8o, gridss14o, gridss27o, gridss31o

svtype2 <- simpleEventType(gr2) info(vcf2)$SIMPLE_TYPE <- NA_character_

Warning message:

info fields with no header: SIMPLE_TYPE

info(vcf2[gr2$vcfId])$SIMPLE_TYPE <- svtype2

Error in [[<-(*tmp*, name, value = c("CTX", "DUP", "DUP", "CTX")) :

4 elements in value to replace 0 elements

info(vcf2)$SIMPLE_TYPE <- svtype2

Warning message: info fields with no header: SIMPLE_TYPE

Could you please help to annotate correctly the output of GRIDSS to get a .vcf with SVLEN and SVTYPE

d-cameron commented 5 years ago

Which versions of R and VariantAnnotation are you using?

On Thu, Jul 18, 2019 at 9:29 PM Fred-07 notifications@github.com wrote:

Hello. I started from a .bam file using the script gridss_separate.sh to perform GRIDSS analysis. I wanted to annotate the SV type in the .vcf file using the commands proposed by D. Cameron.

vcf.file <- "C:/Scratch/NA_12878_duplicates_removed.sv.vcf" vcf <- VariantAnnotation::readVcf(vcf.file, "hg19") gr <- breakpointRanges(vcf)

Warning message: In .breakpointRanges(x, ...) Removing 13731 unpaired breakend variants gridss0bf_5o, gridss0ff_14o, gridss0bb_15o, gridss0bb_17o, gridss0fb_141o, gridss0bb_20o, gridss0ff_23o, gridss0bb_27o, gridss0fb_397o, gridss0bf_145h, gridss0ff_228o, gridss0bf_248o, gridss0bf_314o, gridss0fb_1097o, gridss0bf_382o, gridss0bb_334o, gridss0bf_504o, gridss0ff_497o, gridss0fb_1763o, gridss0ff_608o, gridss0bb_785o, gridss0bf_937o, gridss0bb_797o, gridss0fb_2629o, gridss0bb_820o, gridss0fb_2665o, gridss0bb_842o, gridss0bf_1021o, gridss0bf_1094o, gridss0bb_986o, gridss0bb_1070o, gridss0fb_3431o, gridss0bb_1105o, gridss0bf_1398o, gridss0bf_1422o, gridss0bb_1294o, gridss0fb_4020o, gridss0fb_4068o, gridss0bb_1344o, gridss0bf_1634o, gridss0bf_1638o, gridss0fb_4154o, gridss0fb_4157o, gridss0bf_1641o, gridss0fb_4159o, gridss0fb_4160o, gridss0bf_1643o, gridss0fb_4162o, gridss0bf_1644o, gridss0fb_4167o, gridss0bf_1645o, gridss0fb_4175o, gridss0bf_1648o, gridss0bf_1650o, gridss0fb_4177o, gridss0fb_4178o, gridss0bf_1652o, gridss0fb_4180o, grid [... truncated]

svtype <- simpleEventType(gr) info(vcf)$SIMPLE_TYPE <- NAcharacter

Warning message: info fields with no header: SIMPLE_TYPE

info(vcf[gr$vcfId])$SIMPLE_TYPE <- svtype

Error in [[<-(tmp, name, value = c("DUP", "INS", "INS", "INS", "INS", : 185200 elements in value to replace 0 elements

info(vcf)$SIMPLE_TYPE <- svtype

Error in [[<-(tmp, name, value = c("DUP", "INS", "INS", "INS", "INS", : 185200 elements in value to replace 425839 elements

It is unclear where the gr$vcfId comes from. Does it appear when multiple .bam files are treated in the same analysis? However, after removing the reference to gr$vcfId, there is is another problem.

I tried with the demo data as described below. The problem with gr$vcfId is found with the demo data too.

vcf.file2 <- system.file("extdata", "gridss.vcf", package = "StructuralVariantAnnotation")

vcf2 <- VariantAnnotation::readVcf(vcf.file2, "hg19")

gr2 <- breakpointRanges(vcf2)

Warning message: In .breakpointRanges(x, ...) : Removing 4 unpaired breakend variants gridss8o, gridss14o, gridss27o, gridss31o

svtype2 <- simpleEventType(gr2) info(vcf2)$SIMPLE_TYPE <- NAcharacter

Warning message:

info fields with no header: SIMPLE_TYPE

info(vcf2[gr2$vcfId])$SIMPLE_TYPE <- svtype2

Error in [[<-(tmp, name, value = c("CTX", "DUP", "DUP", "CTX")) :

4 elements in value to replace 0 elements

info(vcf2)$SIMPLE_TYPE <- svtype2

Warning message: info fields with no header: SIMPLE_TYPE

Could you please help to annotate correctly the output of GRIDSS to get a .vcf with SVLEN and SVTYPE

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PapenfussLab/StructuralVariantAnnotation/issues/19?email_source=notifications&email_token=ABOBYOHBXR62TYE64MNYXWDQABHZFA5CNFSM4IEZXHDKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G77J6HQ, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOBYOHDPGKKQJZKF2CSPPTQABHZFANCNFSM4IEZXHDA .

Fred-07 commented 5 years ago

R version 3.6.1 (2019-07-05) VariantAnnotation 1.30.1 StructuralVariantAnnotation 1.0.0

callr and processx packages don't compile anymore. The versions are 3.3.0 and 3.4.0 respectively.

d-cameron commented 5 years ago

Looks like VariantAnnotation 1.30.1 no longer supports writing VCF fields without also writing a matching header. I'll update the script in the next few days.

On Fri, Jul 19, 2019 at 6:06 PM Fred-07 notifications@github.com wrote:

R version 3.6.1 (2019-07-05) VariantAnnotation 1.30.1 StructuralVariantAnnotation 1.0.0

callr and processx packages don't compile anymore. The versions are 3.3.0 and 3.4.0 respectively.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PapenfussLab/StructuralVariantAnnotation/issues/19?email_source=notifications&email_token=ABOBYOBZFAAMUGFAKIVQLCTQAFYZBA5CNFSM4IEZXHDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2K5H5Q#issuecomment-513135606, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOBYOGLIJFKTHQOR4XS7GLQAFYZBANCNFSM4IEZXHDA .

Fred-07 commented 5 years ago

Thank you Daniel!

d-cameron commented 5 years ago

Updating https://github.com/PapenfussLab/gridss/commits/master/example/simple-event-annotation.R is working. Sorry for not posting a follow-up on this issue after fixing it.