Bioconductor / VariantAnnotation

Annotation of Genetic Variants
https://bioconductor.org/packages/VariantAnnotation
26 stars 20 forks source link

expand: expand any variables with Number set to 'R' into REF/ALT #17

Closed lcougnaud closed 5 years ago

lcougnaud commented 5 years ago

The current expand function only expands the AD genotype field to ALT/REF pairs. It would be useful to also expand any FORMAT field with information field Number set to 'R', which also contains one value for each possible allele (including the reference) as mentioned in VCF documentation. I encounter this scenario when annotating also the allele depth in the forward/revert strand with the samtools mpileup function with --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR in SamTools >= 1.3. Example:

geno(header(vcf)):
       Number Type    Description                              
   PL  G      Integer List of Phred-scaled genotype likelihoods
   AD  R      Integer Allelic depths                           
   ADF R      Integer Allelic depths on the forward strand     
   ADR R      Integer Allelic depths on the reverse strand  
# CollapsedVCF
rowRanges(vcf[idx, ])
GRanges object with 1 range and 5 metadata columns:
               seqnames    ranges strand | paramRangeID            REF             ALT      QUAL      FILTER
                  <Rle> <IRanges>  <Rle> |     <factor> <DNAStringSet> <CharacterList> <numeric> <character>
  GeneI:69_G/T    GeneI        69      * |         <NA>              G           T,A,C         0           .
geno(vcf)$ADR[idx, ]
[[1]]
[1] 3534 8698    1    1

Please let me know if there is an alternative way to retrieve/expand the same information. Thanks in advance!

vobencha commented 5 years ago

@lcougnaud this would be a nice addition. Can you add a small test file in inst/extdata and a unit test? Thanks, Valerie

lcougnaud commented 5 years ago

I added a small test file and a corresponding unit test. The example vcf data has been created with mpileupbcftools (version 1.9) on a subset of the 'ex1.bam' example dataset (only region 'seq1:90') from the Rsamtools package. Please let me know if you need extra modifications or updates!

vobencha commented 5 years ago

Thanks @lcougnaud . Merged.

vobencha commented 5 years ago

@lcougnaud Unfortunately the pull request broke the unit tests in VariantAnnotation. I've reverted the merge.

Before you submit the new pull request please make sure

When you submit your next request, you should see a "no merge conflicts" message.

Thanks. Valerie