BioJulia / PopGen.jl

Population Genetics in Julia
https://biojulia.github.io/PopGen.jl/
MIT License
47 stars 16 forks source link

[bug] isbiallelic(::PopData) returns incorrect answer #75

Closed pdimens closed 3 years ago

pdimens commented 3 years ago

description The function isbiallelic(::PopData) returns false even if all isbiallelic(::GenoArray) for the PopData are true

minimal example to reproduce

x = vcf("some_file.vcf", rename_loci = true)
PopData Object
  Markers: SNP
  Ploidy: 2
  Samples: 441
  Loci: 7910
  Populations: 1
  Coordinates: absent

julia> isbiallelic(x)
false

julia> tmp = DataFrames.combine(
    groupby(x.loci, :locus),
    :genotype => isbiallelic => :bial
) ;

julia> all(tmp.bial)
true

expected behavior

julia> isbiallelic(x)
true

julia> tmp = DataFrames.combine(
    groupby(x.loci, :locus),
    :genotype => isbiallelic => :bial
) ;

julia> all(tmp.bial)
true
pdimens commented 3 years ago

https://github.com/BioJulia/PopGen.jl/blob/9d72d253738b0bf75599390111ed6a908ca73534/src/Conditionals.jl#L17-L20

should be

function isbiallelic(data::PopData)
    mtx = reshape(data.loci.genotype, length(loci(data)), :)
    all(map(isbiallelic, eachrow(mtx)))
end

with an extra conditional to make sure the PopData is sorted by [:name, :locus]