OpenMendel / BGEN.jl

MIT License
10 stars 0 forks source link

Read all ALT allele as 1 and REF allele as 0 in `minor_allele_dosage!`? #16

Closed biona001 closed 3 years ago

biona001 commented 3 years ago

Here is a test data (unphased VCF file converted to BGEN using qctool v2) Archive 2.zip

I want to import all BGEN genotypes into numeric matrix, so I wrote the following function

using BGEN, VCFTools
function convert_gt(t::Type{T}, b::Bgen) where T <: Real
    n = n_samples(b)
    p = n_variants(b)
    G = Matrix{t}(undef, p, n)

    # loop over each variant
    i = 1
    for v in iterator(b; from_bgen_start=true)
        dose = minor_allele_dosage!(b, v; T=t)
        copyto!(@view(G[i, :]), dose)
        i += 1
    end
    return G
end

But imported genotype matrix does not agree with VCF file:

Gtest = convert_gt(Float64, Bgen("target.typedOnly.masked.bgen"))
Gtrue = VCFTools.convert_gt(Float64, "target.typedOnly.masked.vcf.gz", trans=true)
julia> all(Gtrue .== Gtest)
false

This seems to be because BGEN is checking which allele is the minor allele (so a 2 is swapped with 0 and 0 is swapped with 2 compared to VCF file)

idx = findall(skipmissing(Gtrue .!= Gtest)) # index where Gtrue and Gtest does not agree
julia> [Gtrue[idx] Gtest[idx]]
66811×2 Matrix{Union{Missing, Float64}}:
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 0.0  2.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 ⋮
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 2.0  0.0
 0.0  2.0
 0.0  2.0
 0.0  2.0
 0.0  2.0

Is there a way to instead read all ALT alleles as 1 and all REF allele as 0?

biona001 commented 3 years ago

Thanks for the tip of using ref_allele_dosage!. Now this works:


using BGEN, VCFTools
function convert_gt(t::Type{T}, b::Bgen) where T <: Real
    n = n_samples(b)
    p = n_variants(b)
    G = Matrix{t}(undef, p, n)

    # loop over each variant
    i = 1
    for v in iterator(b; from_bgen_start=true)
        dose = ref_allele_dosage!(b, v; T=t) # this reads REF allele as 1
        BGEN.alt_dosage!(dose, v.genotypes.preamble) # switch 2 and 0 (ie treat ALT as 1)
        copyto!(@view(G[i, :]), dose)
        i += 1
    end
    return G
end

Gtest = convert_gt(Float64, Bgen("target.typedOnly.masked.bgen"))
Gtrue = VCFTools.convert_gt(Float64, "target.typedOnly.masked.vcf.gz", trans=true)

julia> all(skipmissing(Gtrue .== Gtest))
true