Bioconductor / VariantAnnotation

Annotation of Genetic Variants
https://bioconductor.org/packages/VariantAnnotation
23 stars 20 forks source link

Added vcfFields,VcfFile-method. #11

Closed Liubuntu closed 6 years ago

Liubuntu commented 6 years ago

Hi @vobencha ,

I have just added the vcfFields generic and method for character and VcfFile objects. Please take a look. Thanks!

Best, Qian

vobencha commented 6 years ago

Hi @Liubuntu

showClass("ScanVcfParam") Class "ScanVcfParam" [package "VariantAnnotation"]

Slots:
Name: which fixed info geno Class: IntegerRangesList character character character

Name: samples trimEmpty Class: character logical

Extends: "ScanBVcfParam"


Thanks.
Val
Liubuntu commented 6 years ago

Hi Val @vobencha ,

Thanks for the suggestions. I'll modify them and redo the pull request.

For the purpose of this method, in the VCFArray, we are trying to return all available data fields within VCF file that could be converted into a VCFArray object. The 1-dimensional objects corresponds to all rows, and the >=2 dimensional objects corresponds to all rows and all cols(samples). Subsetting of rows/cols could be passed directly into the [,VCFArray method. The VCFArray extends DelayedArray and has delayed operation, so it's not heavy to include all rows/cols in construction.

So that's why the returned values from vcfFields correspond to the ScanVcfParam in vcfFixed(), vcfGeno(), and vcfInfo() to indicate the names of data fields, but not on vcfWhich() and vcfSamples() for subsetting purpose. Also the samples(header) is already an acccessor for the VCFHeader object and could be easily returned. Any comments here? @mtmorgan

I think I also need to add a vcfFields,VCF inside VariantAnnotation, and then add the vcfFields,VCFStack in GenomicFiles package, so that this vcfFields method could be propagated to the series of VCF related packages.

Best, Qian

vobencha commented 6 years ago

I agree with adding a vcfFields,VCF method. Doing this IMO makes an even stronger case for returning "samples". The VCF specs (https://samtools.github.io/hts-specs/VCFv4.2.pdf) refer to all of INFO, GENO, FIXED, ALT, as well as source, version etc as "fields". If the function is returning the fields in a file on disk or VCF object in R then I think it should probably return all fields in the header. I view this as a higher-level more user-friendly function than scanVcfHeader.

To me it doesn't make sense to tailor this function to VCFArray methods but have it live in VariantAnnotation. If you'd rather only return fields needed for VCFArray subsetting I think that should be a different function with a name specific to VCFArray and should not go in VariantAnnotation. I'm completely fine with that by the way.

Liubuntu commented 6 years ago

Hi Val,

I have just edited the previous comment, and removed the vcfFields,VCFArray method. In constructing of VCFArray, we use VCFArray(file, name = ""), the purpose of implementing vcfFields method is to return the available entries for the name argument.

Yes, now I found that to be a little bit too VCFArray-customized. And the vcfFields method name is more generic, I agree that including samples would make more sense and more consistent with the use of scanvcfHeader as a higher-level function.

For the VCFArray, I favor not to add any specific function to only return the available values for name argument. I can add in the vignette and point users to the vcfFields() function, and consult the geno/info/fixed elements for available names.

Best, Qian

Liubuntu commented 6 years ago

Hi Val,

I have modified the commit. Now it has updated NEWS, documentations, unit test and runnable examples. For the man/VcfFile-class.Rd, I have modified a little bit for the original documentation, so that they are more readable and consistent (some were showing section headers and contents in the same line...).

Now the function vcfFields is directly defined on VCFHeader object, and the method on character, VcfFile and VCF signatures derives from that. Do you think this is necessary for the method on VCFHeader? if not, I can define an internal function like .vcfFields_VCFHeader() for the base function in extracting the contents we want from header, and use the internal function inside the setMethod("vcfFields", signature = ...).

Best, Qian

vobencha commented 6 years ago

Thanks @Liubuntu . Looks good. Having a vcfFields,VCFHeader-method as a base method makes sense. Nice job. Thanks also for cleaning up the VcfFile-class man page. Val

Liubuntu commented 6 years ago

Thanks @vobencha for merging it.