Closed hubentu closed 6 years ago
Sure, it's possible, ideally via pull request ;)
Hi @hubentu, @lawremi, @vobencha, @Liubuntu, Isn't this a situation where extending VRanges might be easier than trying to modify the current VRanges? @hubentu You could implement this extension in your package for now and make the case for inclusion in VariantAnnotation later. H.
Hi @hpages @lawremi @vobencha @Liubuntu ,
Sure. I can try to extend VRanges, but the ref/alt slots has been set as "character". To extend means to inherit it, so it can't be changed to XStringSet
, right? Maybe I will just define a new class by extending GRanges?
Thanks,
Qiang
Oh I see. Then a pull request as @lawremi suggested is needed. It looks like union classes character_OR_DNAStringSet
and/or character_OR_DNAStringSetList
would need to be defined in VariantAnnotation, then the types of the ref
/alt
slots modified in VRanges class def.
Thanks @hpages . I think the idea of defining union classes of character_OR_DNAStringSet
and character_OR_DNAStringSetList
for the ref/alt
slots is great and will be more convenient in extending VRanges
when needed and inherit the methods.
For now I will look into how to define the union class, and study the scripts of collapsedVCF/expandedVCF
and try to write the collapsedVRanges/expandedVRanges
.
Providing a little more details here after talking more with @Liubuntu about this.
Using a class union is kind of the usual trick to allow a slot to be of one type or the other. That's because the union class is the parent of the classes that belong to the union. For example if character_OR_XStringSet
is defined as the union of character
and XStringSet
then it's the parent of both. Similarly if character_OR_XStringSet_OR_XStringSetList
is defined as the union of character_OR_XStringSet
and XStringSetList
, then it's the grand-parent of character
and XStringSet
and the parent of XStringSetList
. This allows ExpandedVRanges to specify alt="character_OR_XStringSet"
and CollapsedVRanges to specify alt="XStringSetList"
if VRanges has alt="character_OR_XStringSet_OR_XStringSetList"
. VRanges could then become a virtual class with 2 concrete subclasses ExpandedVRanges and CollapsedVRanges with ExpandedVRanges now playing the role of the current VRanges class.
This feels like a lot of contortions though. Another maybe cleaner design would be to remove the ref
and alt
slots from VRanges and make it a virtual class, and then have ExpandedVRanges and CollapsedVRanges add these slots with the desired type.
But now I see that VariantAnnotation defines a VRangesList class. Couldn't this be used for what you have in mind for CollapsedVRanges?
I'm adding @rcastelo to the discussion.
Support for XString is not needed to support the storage of multiple alt alleles. While supporting XString might be a future direction, it's orthogonal to this problem, except that it would touch the same slots. There's no need to touch the @ref
. Just the @alt
needs to become "character_or_CharacterList"
.
In my experience, dealing with collapsed alts during an analysis is a pain. The head of this issue mentions MAF. If there is a desire to export to MAF in collapsed form, we could solve that problem without fundamentally changing the data structure. I guess we should consider the use cases.
Hi,
I do not work with cancer data where MAF/BED formats are used for storing somatic variants but here's my opinion as a user of VRanges
with germline variation.
The definition of the VRanges
class already suggests in VariantAnnotation/R/AllClasses.R#373 that alleles could be stored as a XStringSet
object. This could be a good idea because of the more compact storage and I expect the number of changes involved in the packages using this class would be minimal.
However, considering a DNAStringSetList
to store alleles in a VRanges
would break, in my opinion, a design decision made on the VRanges
class, but I guess @lawremi is the one who can ultimately shed light on this.
As a user of VRanges
, my understanding of this class is that it stores only one allele per range on purpose to facilitate the semantics of storing the number of reads supporting an alternative allele (altDepth
) and filtering variants with the FilterRules
class. Using a DNAStringSetList
object to store alleles, as in the CollapsedVCF
class would at least complicate the use this semantics within VRanges
. So, I guess that a virtual VRanges
should not have the slots ref
, alt
, altDepth
, softFilterMatrix
and hardFilters
and these slots would move to the ExpendedVRanges
. The current functionality using the FilterRules
class on VRanges
objects would also move to the ExpandedVRanges
.
However, looking at the TCGA MAF format, I do not see that this format requires storing more than one allele per range in a single column, as it happens with VCF. In fact, the SomaticSignatures package also works with MAF data and seems to be happily using the VRanges
class as it is defined now. As far as I now, the BED format does not contemplate either storing more than one single value per row and column. From this perspective, maybe just extending VRanges
as it is defined now, to better accomodate MAF data, could be a way forward. Talking to other developers with packages dealing with MAF data, such as SomaticSignatures, could also be an advantage in the long run.
Hi @rcastelo ,
Thank you for your input. I totally agree with you that ExpandedVRanges
is a good alternative for current VRanges, and MAF/bed usually don't store variants with multiple alleles.
However, my colleagues often use the simple bed format or plain text for collapsed alleles. One of my project was to deal with indels/SNPs with multiple alternative alleles and multi-nucleotide variants(MNV) (not from VCF), which requires a method to convert variants between collapsed
and expanded
. MNV even need a conversion method for neighbor regions. As a user, the CollapsedVRanges/ExpandedVRanges
classes are exactly what I would expect.
Thanks and happy new year!
Best, Qiang
I think it would be a lot simpler for you to just implement a importer/exporter for your bespoke "BED" representation of variants that operates on VRanges as it exists now. That's what we do with VCF now.
Closed for no further response from the request.
@lawremi @vobencha @hpages Hi, I am writing a package and want to use a data class very similar to
VRanges
but needs some modification. Is it possible to add aCollapsedVRanges
/ExpandedVRanges
, with slots ofREF
/ALT
in theDNAStringSet
/DNAStringSetList
class? Such a class could be a simple and useful data structure for variants from other formats, such as maf/bed, instead of VCF. Thanks! Qiang