Zilong-Li / vcfppR

The fastest VCF/BCF parser in R https://doi.org/10.1093/bioinformatics/btae049
https://zilong-li.github.io/vcfppR/
Other
13 stars 3 forks source link

Use 'af' argument in vcfcomp #5

Closed Truongphikt closed 3 months ago

Truongphikt commented 3 months ago

When I read docs at vcfcomp, I see the af argument.

af   file path to allele frequency text file or saved RDS file.

Could someone give me more info about the "allele frequency text file" and "RDS" format? Maybe It can help me use my "exterior MAF file" to overwrite MAF in true VCF.

My "exterior MAF" format:

21:5030578:C:T  0.0115553
21:5030588:T:C  0.00827608
21:5030596:A:G  0.000312305
21:5030641:C:T  0
21:5030673:G:A  0.000312305
21:5030912:CA:C 0.00218613
21:5030957:G:A  0.037945
21:5030960:T:C  0.00374766
21:5031004:C:G  0.000312305
21:5031018:C:CCTATGATCA 0.000312305
Zilong-Li commented 3 months ago

The af can accept a tab separated text file with five columns with header: chr pos ref alt af

Truongphikt commented 3 months ago

The af can accept a tab separated text file with five columns with header: chr pos ref alt af

After tried a while when I additional AF file to the repo's test set, I got an error

Error in `[.data.frame`(af, , "chr") : undefined columns selected
Calls: vcfcomp ... tryCatchOne -> <Anonymous> -> paste0 -> [ -> [.data.frame
Execution halted

The command:

rawvcf = "to_the_path/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"

phasedvcf= "to_the_path/1kGP_high_coverage_Illumina.chr21.filtered.SNV_INDEL_SV_phased_panel.vcf.gz"

maf_file= "to_the_path/21_maf_seperate_chr.txt"

res <- vcfcomp(test = rawvcf, truth = phasedvcf, info=FALSE,
               stats = "nrc", region = "chr21:1-5100000",
               af=maf_file,
               formats = c("GT","GT"))

My "af file" is something like (maf_file):

 $ cat 21_maf_seperate_chr.txt | head
chr21   5030578 C   T   0.0115553
chr21   5030588 T   C   0.00827608
chr21   5030596 A   G   0.000312305
chr21   5030641 C   T   0
chr21   5030673 G   A   0.000312305
chr21   5030912 CA  C   0.00218613
Zilong-Li commented 3 months ago

You need to add a header line. See my comment above.

Truongphikt commented 3 months ago

@Zilong-Li Thanks for your help.

A little note for other people with the same use case as me is that the format of the chr column is the Prefix Format (chr1, chr2, etc ..). If you use Numerical Format (1, 2, 3, ...), you will meet an error:

Error in tapply(1:length(x), x, function(w) { : 
  arguments must have same length
Calls: vcfcomp -> concordance_by_freq -> tapply
Execution halted
Zilong-Li commented 3 months ago

The idea is that we have to use the consistent format to match the id in af file to VCF file. I'll update the docs at some point.