ErasmusMC-CCBC / katdetectr

An R package for detection, characterization and visualization of kataegis.
GNU General Public License v3.0
5 stars 1 forks source link

VRange input error #8

Closed ThomasGro closed 11 months ago

ThomasGro commented 11 months ago

I want to use a VRange object as input and tried it first with your MAF file as source, filtered for "SNP". Running 'detectKataegis(genomicVariants = vr, aggregateRecords = TRUE)' returns error. I belief the "sampleNames(vr)" are not used?! which leads to duplicated entries.

library(VariantAnnotation) gr=makeGRangesFromDataFrame(APL_primary.maf, keep.extra.columns=TRUE, ignore.strand=TRUE, seqinfo=NULL, seqnames.field="Chromosome", start.field="Start_Position", end.field="End_Position", starts.in.df.are.0based=TRUE, ) values(gr) <- data.frame ( ID = APL_primary.maf$Tumor_Sample_Barcode, ref = APL_primary.maf$Reference_Allele, alt = APL_primary.maf$Tumor_Seq_Allele2)

vr=makeVRangesFromGRanges(gr, ref.field='ref', alt.field='alt', sampleNames.field='ID', keep.extra.columns=TRUE) genome(seqinfo(vr))='hg19'

detectKataegis(genomicVariants = vr, aggregateRecords = TRUE) Fehler in [[<-(*tmp*, name, value = new("CompressedIntegerList", elementType = "integer", : 165 elements in value to replace 171 elements traceback() 8: stop(paste(lv, "elements in value to replace", nrx, "elements")) 7: [[<-(*tmp*, name, value = new("CompressedIntegerList", elementType = "integer", elementMetadata = NULL, metadata = list(), unlistData = c(28L, 96L, 93L, 95L, 92L, 94L, 7L, 49L, 150L, 149L, 27L, 81L, 76L, 17L, 78L, 140L, 139L, 138L, 134L, 135L, 42L, 70L, 98L, 97L, 58L, 157L, 74L, 85L, 19L, 151L, 152L, 64L, 37L, 87L, 148L, 11L, 16L, 3L, 5L, 6L, 53L, 77L, 114L, 65L, 36L, 156L, 155L, 51L, 23L, 79L, 32L, 26L, 61L, 30L, 12L, 66L, 67L, 39L, 43L, 153L, 9L, 45L, 100L, 118L, 119L, 117L, 50L, 44L, 144L, 145L, 20L, 18L, 22L, 31L, 21L, 35L, 29L, 154L, 62L, 41L, 163L, 161L, 162L, 82L, 169L, 168L, 83L, 166L, 167L, 171L, 170L, 84L, 165L, 164L, 159L, 54L, 130L, 141L, 142L, 59L, 10L, 99L, 120L, 124L, 125L, 121L, 24L, 63L, 128L, 127L, 126L, 52L, 129L, 71L, 13L, 147L, 146L, 69L, 107L, 102L, 34L, 33L, 106L, 104L, 109L, 105L, 108L, 132L, 72L, 101L, 112L, 113L, 56L, 80L, 143L, 111L, 15L, 75L, 14L, 68L, 133L, 131L, 48L, 1L, 25L, 46L, 2L, 90L, 91L, 60L, 89L, 55L, 115L, 116L, 57L, 88L, 40L, 73L, 38L, 158L, 110L, 47L, 4L, 86L, 8L), partitioning = new("PartitioningByEnd", end = 1:165, NAMES = NULL, elementType = "ANY", elementMetadata = NULL, metadata = list()))) 6: [[<-(*tmp*, name, value = new("CompressedIntegerList", elementType = "integer", elementMetadata = NULL, metadata = list(), unlistData = c(28L, 96L, 93L, 95L, 92L, 94L, 7L, 49L, 150L, 149L, 27L, 81L, 76L, 17L, 78L, 140L, 139L, 138L, 134L, 135L, 42L, 70L, 98L, 97L, 58L, 157L, 74L, 85L, 19L, 151L, 152L, 64L, 37L, 87L, 148L, 11L, 16L, 3L, 5L, 6L, 53L, 77L, 114L, 65L, 36L, 156L, 155L, 51L, 23L, 79L, 32L, 26L, 61L, 30L, 12L, 66L, 67L, 39L, 43L, 153L, 9L, 45L, 100L, 118L, 119L, 117L, 50L, 44L, 144L, 145L, 20L, 18L, 22L, 31L, 21L, 35L, 29L, 154L, 62L, 41L, 163L, 161L, 162L, 82L, 169L, 168L, 83L, 166L, 167L, 171L, 170L, 84L, 165L, 164L, 159L, 54L, 130L, 141L, 142L, 59L, 10L, 99L, 120L, 124L, 125L, 121L, 24L, 63L, 128L, 127L, 126L, 52L, 129L, 71L, 13L, 147L, 146L, 69L, 107L, 102L, 34L, 33L, 106L, 104L, 109L, 105L, 108L, 132L, 72L, 101L, 112L, 113L, 56L, 80L, 143L, 111L, 15L, 75L, 14L, 68L, 133L, 131L, 48L, 1L, 25L, 46L, 2L, 90L, 91L, 60L, 89L, 55L, 115L, 116L, 57L, 88L, 40L, 73L, 38L, 158L, 110L, 47L, 4L, 86L, 8L), partitioning = new("PartitioningByEnd", end = 1:165, NAMES = NULL, elementType = "ANY", elementMetadata = NULL, metadata = list()))) 5: $<-(*tmp*, "revmap", value = new("CompressedIntegerList", elementType = "integer", elementMetadata = NULL, metadata = list(), unlistData = c(28L, 96L, 93L, 95L, 92L, 94L, 7L, 49L, 150L, 149L, 27L, 81L, 76L, 17L, 78L, 140L, 139L, 138L, 134L, 135L, 42L, 70L, 98L, 97L, 58L, 157L, 74L, 85L, 19L, 151L, 152L, 64L, 37L, 87L, 148L, 11L, 16L, 3L, 5L, 6L, 53L, 77L, 114L, 65L, 36L, 156L, 155L, 51L, 23L, 79L, 32L, 26L, 61L, 30L, 12L, 66L, 67L, 39L, 43L, 153L, 9L, 45L, 100L, 118L, 119L, 117L, 50L, 44L, 144L, 145L, 20L, 18L, 22L, 31L, 21L, 35L, 29L, 154L, 62L, 41L, 163L, 161L, 162L, 82L, 169L, 168L, 83L, 166L, 167L, 171L, 170L, 84L, 165L, 164L, 159L, 54L, 130L, 141L, 142L, 59L, 10L, 99L, 120L, 124L, 125L, 121L, 24L, 63L, 128L, 127L, 126L, 52L, 129L, 71L, 13L, 147L, 146L, 69L, 107L, 102L, 34L, 33L, 106L, 104L, 109L, 105L, 108L, 132L, 72L, 101L, 112L, 113L, 56L, 80L, 143L, 111L, 15L, 75L, 14L, 68L, 133L, 131L, 48L, 1L, 25L, 46L, 2L, 90L, 91L, 60L, 89L, 55L, 115L, 116L, 57L, 88L, 40L, 73L, 38L, 158L, 110L, 47L, 4L, 86L, 8L), partitioning = new("PartitioningByEnd", end = 1:165, NAMES = NULL, elementType = "ANY", elementMetadata = NULL, metadata = list()))) 4: $<-(*tmp*, "revmap", value = new("CompressedIntegerList", elementType = "integer", elementMetadata = NULL, metadata = list(), unlistData = c(28L, 96L, 93L, 95L, 92L, 94L, 7L, 49L, 150L, 149L, 27L, 81L, 76L, 17L, 78L, 140L, 139L, 138L, 134L, 135L, 42L, 70L, 98L, 97L, 58L, 157L, 74L, 85L, 19L, 151L, 152L, 64L, 37L, 87L, 148L, 11L, 16L, 3L, 5L, 6L, 53L, 77L, 114L, 65L, 36L, 156L, 155L, 51L, 23L, 79L, 32L, 26L, 61L, 30L, 12L, 66L, 67L, 39L, 43L, 153L, 9L, 45L, 100L, 118L, 119L, 117L, 50L, 44L, 144L, 145L, 20L, 18L, 22L, 31L, 21L, 35L, 29L, 154L, 62L, 41L, 163L, 161L, 162L, 82L, 169L, 168L, 83L, 166L, 167L, 171L, 170L, 84L, 165L, 164L, 159L, 54L, 130L, 141L, 142L, 59L, 10L, 99L, 120L, 124L, 125L, 121L, 24L, 63L, 128L, 127L, 126L, 52L, 129L, 71L, 13L, 147L, 146L, 69L, 107L, 102L, 34L, 33L, 106L, 104L, 109L, 105L, 108L, 132L, 72L, 101L, 112L, 113L, 56L, 80L, 143L, 111L, 15L, 75L, 14L, 68L, 133L, 131L, 48L, 1L, 25L, 46L, 2L, 90L, 91L, 60L, 89L, 55L, 115L, 116L, 57L, 88L, 40L, 73L, 38L, 158L, 110L, 47L, 4L, 86L, 8L), partitioning = new("PartitioningByEnd", end = 1:165, NAMES = NULL, elementType = "ANY", elementMetadata = NULL, metadata = list()))) 3: .reduceOverlappingVariants(genomicVariantsImported) 2: .processGenomicVariants(genomicVariantsImported) 1: detectKataegis(genomicVariants = vr, aggregateRecords = TRUE)

vr.zip

daanhazelaar commented 11 months ago

Hi @ThomasGro,

Thank you for reaching out! I just checked the vr file you shared. It looks like the start locations of the variants in your data are -1 from their respective end location (see image). Screenshot 2023-12-08 at 17 35 41 )

I was not aware that this was allowed in a vRanges object. Seems also strange to me that the start location lies behind the end location of the variant. Or is there a reason for this?

I'll check how to fix this and add some tests for this particular problem in the future. For now, the following works on my machine:

vr2 <- vr |> 
    as_tibble() |> 
    mutate(
        start = start -1
    ) |> 
    GenomicRanges::makeGRangesFromDataFrame(keep.extra.columns = TRUE) |> 
    VariantAnnotation::makeVRangesFromGRanges()

kd <- detectKataegis(vr2, aggregateRecords = TRUE)
ThomasGro commented 11 months ago

Thank you, for the investigation.

The point is that I set for your MAF test file: makeGRangesFromDataFrame(starts.in.df.are.0based=TRUE)

This led to the shift in the start position. However, I belief MAF files usually are 0-based, whereas VCFs are 1-based.

Best, Thomas From: Daan Hazelaar @.> Sent: Freitag, 8. Dezember 2023 17:44 To: ErasmusMC-CCBC/katdetectr @.> Cc: Thomas Grombacher @.>; Mention @.> Subject: Re: [ErasmusMC-CCBC/katdetectr] VRange input error (Issue #8)

[WARNING - EXTERNAL EMAIL] Do not open links or attachments unless you recognize the sender of this email. If you are unsure please click the button "Report suspicious email"

Hi @ThomasGrohttps://urldefense.com/v3/__https:/github.com/ThomasGro__;!!Eu8ikxSnpXkBCg!ehptsqoIoRPrvna2sWVo7MGThZX1GM04_b4gXr6R_aIg-EU5C2DRChPJUI8UaxpBIX35g9jzA1mqj0g3apMfT8wAFhnxdI2NVeg$,

Thank you for reaching out! I just checked the vr file you shared. It looks like the start locations of the variants in your data are -1 from their end location (see image). Screenshot.2023-12-08.at.17.35.41.png (view on web)https://urldefense.com/v3/__https:/github.com/ErasmusMC-CCBC/katdetectr/assets/55545896/fe9dd3ed-67be-4c55-bb15-086132c25ba1__;!!Eu8ikxSnpXkBCg!ehptsqoIoRPrvna2sWVo7MGThZX1GM04_b4gXr6R_aIg-EU5C2DRChPJUI8UaxpBIX35g9jzA1mqj0g3apMfT8wAFhnxl83rnig$ )

I was not aware that this was allowed in a vRanges object. Seems also strange to me that the start location lies behind the end location of the variant. Or is there a reason for this?

I'll check how to fix this and add some tests for this particular problem in the future. For now, the following works on my machine:

vr2 <- vr |>

as_tibble() |>

mutate(

    start = start -1

) |>

GenomicRanges::makeGRangesFromDataFrame(keep.extra.columns = TRUE) |>

VariantAnnotation::makeVRangesFromGRanges()

kd <- detectKataegis(vr2, aggregateRecords = TRUE)

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/ErasmusMC-CCBC/katdetectr/issues/8*issuecomment-1847506810__;Iw!!Eu8ikxSnpXkBCg!ehptsqoIoRPrvna2sWVo7MGThZX1GM04_b4gXr6R_aIg-EU5C2DRChPJUI8UaxpBIX35g9jzA1mqj0g3apMfT8wAFhnx-2yfQeQ$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AENMOYIXBBNNITOHSJS2CIDYIM7VVAVCNFSM6AAAAABAM3QQSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBXGUYDMOBRGA__;!!Eu8ikxSnpXkBCg!ehptsqoIoRPrvna2sWVo7MGThZX1GM04_b4gXr6R_aIg-EU5C2DRChPJUI8UaxpBIX35g9jzA1mqj0g3apMfT8wAFhnx5BrUTQs$. You are receiving this because you were mentioned.Message ID: @.**@.>>

This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith.

Click merckgroup.com/disclaimerhttps://www.merckgroup.com/en/legal-disclaimer/mail-disclaimer.html to access the German, French, Spanish, Portuguese, Turkish, Polish and Slovak versions of this disclaimer.

Please find our Privacy Statement information by clicking here: merckgroup.com/privacy-statements-by-locationhttps://www.merckgroup.com/en/privacy-statement/privacy-statements-by-location.html

daanhazelaar commented 11 months ago

Hi @ThomasGro,

Aah thank you for the explanation! Do you perhaps have a recommendation for how I should deal with this?

I was thinking about adding tests that let the user now that the start location >= end location. But if this a common way of describing variant locations perhaps I should incorporate this in a different way? Any suggestions are much appreciated!

Kind regards, Daan Hazelaar

ThomasGro commented 11 months ago

Hello Daan, you may want to look here: https://genomewiki.ucsc.edu/index.php/Coordinate_Transforms

The 0-based system has SNV coordinates like start = 0 end =1 Whereas 1-based systems has SNV coordinates like start = 1 end = 1

Length of the range of 0-based systems is (end-start) and in 1-base systems (end-(start-1)).

Best, Thomas From: Daan Hazelaar @.> Sent: Dienstag, 12. Dezember 2023 10:37 To: ErasmusMC-CCBC/katdetectr @.> Cc: Thomas Grombacher @.>; Mention @.> Subject: Re: [ErasmusMC-CCBC/katdetectr] VRange input error (Issue #8)

[WARNING - EXTERNAL EMAIL] Do not open links or attachments unless you recognize the sender of this email. If you are unsure please click the button "Report suspicious email"

Hi @ThomasGrohttps://urldefense.com/v3/__https:/github.com/ThomasGro__;!!Eu8ikxSnpXkBCg!fSjwJJ0Mrpkme8RK9ZmcIE7dsbks7pBmQIqOnJxi_WaHlt8pY6_pYNDmGWFce-DHndtE6fqfRkNcHZx3UtKVsSSqXtSPN8qtum8$,

Aah thank you for the explanation! Do you perhaps have a recommendation for how I should deal with this?

I was thinking about adding tests that let the user now that the start location >= end location. But if this a common way of describing variant locations perhaps I should incorporate this in a different way? Any suggestions are much appreciated!

Kind regards, Daan Hazelaar

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/ErasmusMC-CCBC/katdetectr/issues/8*issuecomment-1851650985__;Iw!!Eu8ikxSnpXkBCg!fSjwJJ0Mrpkme8RK9ZmcIE7dsbks7pBmQIqOnJxi_WaHlt8pY6_pYNDmGWFce-DHndtE6fqfRkNcHZx3UtKVsSSqXtSPrwnkXlY$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AENMOYJSTPP24VDMASL2WTTYJAQVBAVCNFSM6AAAAABAM3QQSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJRGY2TAOJYGU__;!!Eu8ikxSnpXkBCg!fSjwJJ0Mrpkme8RK9ZmcIE7dsbks7pBmQIqOnJxi_WaHlt8pY6_pYNDmGWFce-DHndtE6fqfRkNcHZx3UtKVsSSqXtSPnv5_BTY$. You are receiving this because you were mentioned.Message ID: @.**@.>>

This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith.

Click merckgroup.com/disclaimerhttps://www.merckgroup.com/en/legal-disclaimer/mail-disclaimer.html to access the German, French, Spanish, Portuguese, Turkish, Polish and Slovak versions of this disclaimer.

Please find our Privacy Statement information by clicking here: merckgroup.com/privacy-statements-by-locationhttps://www.merckgroup.com/en/privacy-statement/privacy-statements-by-location.html

daanhazelaar commented 11 months ago

Hi @ThomasGro,

0 based or 1 based does not matter for Katdetectr. as long as: start >= end, everything is fine.

In the data file you supplied: start < end, for this things brake in Katdetectr at least. That's where to problem comes from. Right?

Kind regards, Daan Hazelaar