ShixiangWang / sigminer

🌲 An easy-to-use and scalable toolkit for genomic alteration signature (a.k.a. mutational signature) analysis and visualization in R https://shixiangwang.github.io/sigminer/reference/index.html
https://shixiangwang.github.io/sigminer/
Other
141 stars 18 forks source link

Feature Request: Take a minimal data.frame as input #423

Closed selkamand closed 1 year ago

selkamand commented 1 year ago

Hi,

Thanks for you work on this amazing package!

Would you consider adding the ability to supply a simple, minimal dataframe as input (e.g. Sample/Chrom/Pos/Ref/Alt)?

This would make it very convenient to run - especially if your pulling data from a non MAF-format database.

The idea being that if the input only requires the data that's absolutely required, to further minimise barriers to entry

ShixiangWang commented 1 year ago

@selkamand Thanks for your suggestion, it is a reasonable request. I will take a look, however, it would take one or two weeks as I am quite busy recently.

ShixiangWang commented 1 year ago

@selkamand Could you try installing the latest version from Github? A function read_maf_minimal() has been added to fit your needs.

An example is given in the doc:

laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools", mustWork = TRUE)
if (!require("R.utils")) {
  message("Please install 'R.utils' package firstly")
} else {
  laml <- read_maf(maf = laml.maf)
  laml

  laml_mini <- laml@data[, list(
    Tumor_Sample_Barcode, Chromosome,
    Start_Position, End_Position,
    Reference_Allele, Tumor_Seq_Allele2
  )]
  laml2 <- read_maf_minimal(laml_mini)
  laml2
}

For a minimal MAF, you need to provide the following columns:

Tumor_Sample_Barcode, Chromosome, Start_Position, End_Position, Reference_Allele, Tumor_Seq_Allele2

Reference_Allele is Ref, Tumor_Seq_Allele2 is Alt, Tumor_Sample_Barcode is Sample.

selkamand commented 1 year ago

Hi @ShixiangWang thanks for a prompt response and update!

I've played around with read_maf_minimal and it works well