ErasmusMC-CCBC / katdetectr

An R package for detection, characterization and visualization of kataegis.
GNU General Public License v3.0
5 stars 1 forks source link

Only non-synonymous variants included? #2

Closed Joannavonberg closed 1 year ago

Joannavonberg commented 1 year ago

Hi developers, first of all thank you for creating this package! It worked immediately out of the box for me, and I've been able to analyse all my samples so quickly :) When I accidentally tried to load an empty file, I got an error message from maftools saying there were no non-synonymous variants present in the file. I checked it for one of my other samples, and I get to the same number of variants that is reported in the katdetectr results if I filter on myloadedMAF$IMPACT %in% c("HIGH", "MODERATE") (i.e. filtering on non-synonymous variants).

So I have two questions:

  1. Do you only include non-synonymous variants?
  2. If yes, is it necessary / informative to do this? I would naively have thought that it makes sense to look for kataegis genome-wide, and not only in coding regions.

Looking forward to hearing from you!

Joannavonberg commented 1 year ago

I tried reading the same MAF in with the maftools::read.maf() function and I get the same number of variants as katdetectr (214 versus ~45000 that I get if I just read_table() the MAF). So it looks like read.maf() by default only keeps non-synonymous variants, something that I did not see mentioned in the documentation of their function but might be known to you?

Joannavonberg commented 1 year ago

Here is the code for the read.maf() function: https://github.com/PoisonAlien/maftools/blob/master/R/read_maf_dt.R They indeed explicitly keep only non-synonymous variants. They have a parameter vc_nonSyn to use different filters though, would you consider making this a parameter that is passed from detectKataegis, so users can change it themselves if they want (I would be an example of a user that would like to do that ;-)?

Joannavonberg commented 1 year ago

Thank you for the superquick edit! I'm going to try the new version :)

daanhazelaar commented 1 year ago

Dear @Joannavonberg,

Thank you for opening this issue and good to hear that katdetectr works well!

Also thank you for clearly explaining your relevant questions.

I have pushed an update of katdetectr that should make sure that now all variants (both synonymous and non-synonymous) are analysed when supplying a .maf file. Can you please try out this updated version and let me know if things are working to your satisfaction?

# download the latest version of katdetectr from github
devtools::install_git(url = "https://github.com/ErasmusMC-CCBC/katdetectr")

# load and attach katdetectr
library("katdetectr")

# detect kataegis
kd <- detectKataegis(genomicVariants = "path/to/file.maf")

# visualize in a rainfall plot
rainfallPlot(kd)

Hereby my extended answer to your questions:

I definitely recommend including both non-synonymous and synonymous variants when analysing a sample for kataegis. However, as you clearly noted; when using the previous version of katdetectr to analyse a .maf file only the non-synonymous were analysed.

This was indeed due to the maftools::read.maf() function that we use in katdetectr to load .maf files. This function reads a .maf file and constructs a MAF object. I have now realised that the non-synonymous variants are stored in the data slot and the synonymous variants are stored in the maf.silent slot of the MAF object (see code below). I was unaware of this so thanks again for pointing this out!

Previously, katdetectr only analysed the variants in the data slot and therefor only the non-synonymous variants. I now made sure both slots are included and therefor all variants present in the .maf file are analysed by katdetectr.

# Construct MAF object
maf <- maftools::read.maf(maf = system.file(package = "katdetectr", "extdata/APL_primary.maf"), verbose = FALSE)

# acces non-synonymous variants in MAF object
maf@data

# acces synonymous variants in MAF object
maf@maf.silent

# combine all variants in single tibble
maf@data |> 
    dplyr::as_tibble() |> 
    dplyr::bind_rows(maf@maf.silent)

Regarding the vc_nonSyn argument in the maftools::read.maf() function. This argument can be used to discern if a certain type of variant should be considered as non-synonymous or synonymous in a MAF object (see code below).

# Construct MAF object
maf <- maftools::read.maf(maf = system.file(package = "katdetectr", "extdata/APL_primary.maf"), verbose = FALSE, vc_nonSyn = "Frame_Shift_Del")

# This contains only Frame_Shift_Del non-synonymous variants
maf@data

# This contains all other (synonymous) variants
maf@maf.silent

I prefer not incorporating the vc_nonSyn argument as we want to minimise the preprocessing options in katdetectr. If you want to analyse preprocessed data please do the preprocessing yourself, save this processed data as a .maf file (or vcf or VRanges) and than use katdetectr to analyse your data.

I hope this explanation is useful for you. If you have any follow-up or other questions or comments feel free to reach out!

Kind regards, Daan Hazelaar

Joannavonberg commented 1 year ago

Hi @daanhazelaar , thanks for the elaborate response! Fully agree with not using the vc_nonSyn argument, I was just trying to see what could be a quick fix but didn't know about the separately saved non-coding and synonymous variants. I'm trying out the new version at the moment, but obviously it's a bit slower with so many more variants so I might update you next week

daanhazelaar commented 1 year ago

Hi @Joannavonberg,

No problem and sounds good! I'm looking forward to your update!

Have a nice weekend!

Kind regards, Daan Hazelaar