UMCUGenetics / MutationalPatterns

R package for extracting and visualizing mutational patterns in base substitution catalogues
MIT License
104 stars 45 forks source link

Data format of using this package #43

Closed xiw588 closed 5 years ago

xiw588 commented 5 years ago

Hi I am wondering if only VCF format could use this package or any other format is also ok?

Thank you.

roelj commented 5 years ago

Hi @xiw588

We only have a read_vcfs_as_granges function to load variants. Which formats would you like to be able to load?

xiw588 commented 5 years ago

Thank you for your quick response! I only have the compiled data in excel/csv from Oncopanel, and I am wondering if there is any way that I could convert it to and apply this package on it. I could send a sample of my data to you if it helps explain better.

roelj commented 5 years ago

Hi @xiw588

You could try to load the data and transform it into GRanges. We provide example VCF files that you can use to compare the resulting GRanges.

I don't know Oncopanel, but it seems to be a rather specific format. Could you share a sample file, so we can evaluate whether we can write a load function for it?

xiw588 commented 5 years ago

Hi Roel,

Thank you for your response! Can you give me an email address so that I could attache the data sample that I have at hand. It will be very helpful if you could give me some idea on how to convert it to VCF.

roelj commented 5 years ago

You can find it here: https://github.com/UMCUGenetics/MutationalPatterns/blob/master/DESCRIPTION#L9

roelj commented 5 years ago

This format is too specific to add a generic load function for. Your best bet is to convert this to the Variant Call Format use the "Genomic Mutation Chromosome Cd" as CHROM, "Genomic Mutation Position" as POS, "Genomic Mutation Reference Allele" as REF, "Genomic Mutation Alternate Allele" as ALT, and so on.

VCF is plain-text, so if you follow the document I linked to above, you should be able to make it work. :)

Please not that MutationalPatterns currently only deals with single-nucleotide polymorphisms (SNPs).