OpenMendel / OrdinalGWAS.jl

Genome-wide association studies (GWAS) for ordered categorical phenotypes
https://openmendel.github.io/OrdinalGWAS.jl/latest/
MIT License
21 stars 7 forks source link

Syntax of covrowinds and geneticrowinds #34

Open came203 opened 1 year ago

came203 commented 1 year ago

As a follow-up to my last issue, it's quite cumbersome to construct VCF or BED files for each phenotype separately, so the best way to go on would be probably to use the covrowinds and geneticrowinds argument to select only the samples used in the current analysis. As I'm by no means an expert in Julia, how would one go on to implement this, similar to the snpinds argument which works very nicely. Is there any easy way to construct the covrowinds and geneticrowinds using IDs of participants? Or does one have to construct the indices manually? What is the syntax of the indices?` Any pointers on this would be highly appreciated

kose-y commented 1 year ago

My go-to method would be using dictionaries. Julia has an efficient built-in data structure for the dictionaries, just like in Python.

Here is an example using the bgen format (edited from https://github.com/kose-y/TrajGWAS-reproducibility/blob/main/ukbiobank/4_scoretest_script/scoretest.jl):

using BGEN

genetic_iids_subsample = # a list of sample names here
data = Bgen(bgenfilename * ".bgen"; sample_path = samplefilename)
genetic_iids = samples(data) # assuming sample names are `String`s directly mapped to individual IDs. 

order_dict = Dict{String, Int}() # creating an empty dictionary with key type String, value type Int.
for (i, iid) in enumerate(genetic_iids)
    order_dict[iid] = i
end

sample_indicator = falses(length(genetic_iids))
for v in genetic_iids_subsample
    sample_indicator[order_dict[v]] = true
end

# then use sample_indicator as `geneticrowinds`

You can do the similar with the SNP's RefSeq IDs when constructing snpinds.

It would be a nice feature to have in our package, but it is tricky since it depends on the files you have, as you can see in the example that I linked.