Closed silvewheat closed 3 years ago
Hi, Thanks for your interest in Loter.
I will try to answer all your questions.
# load a toy dataset from Loter `data` directory (assuming we are in Loter project root directory)
import os
import numpy as np
H_ceu = np.load(os.path.join("data", "H_ceu.npy"), allow_pickle=True)[:,:5000]
H_yri = np.load(os.path.join("data", "H_yri.npy"), allow_pickle=True)[:,:5000]
H_mex = np.load(os.path.join("data", "H_mex.npy"), allow_pickle=True)[:,:5000]
###### from here is the interesting code
# import
from loter.local_ancestry import boostrap_loter_multiple_pops, mode
# inout paramters
l_H=[H_ceu, H_yri]
h_adm=H_mex
range_lambda=np.arange(1.5, 5.5, 0.5)
nb_bagging=20
num_threads=8
# chunk of code to get bagging vote details
input_loter = (l_H, h_adm)
n, m = h_adm.shape
counts = np.zeros((len(l_H), n, m))
for l in range_lambda:
res_boostrap = boostrap_loter_multiple_pops(
*input_loter, lambd=l,
counts=counts, nbrun=nb_bagging,
num_threads=num_threads
)
# counts is a list containing arrays of number of votes per individual (in rows)
# and per SNP (in columns) for each reference populations
counts
# to get the majority vote
res_tmp = mode(counts)
In case of equality, the index of the first reference population will be return, so it depends on the order in the input [H_ref1, H_ref2, H_ref3, H_ref4, H_ref5]
. It is not ideal, we should definitely work on this to better manage the indecisive case.
rate_vote
is not relevant in this mode, I corrected the notebook https://github.com/bcm-uga/Loter/blob/master/python-package/Local_Ancestry_Example.ipynb and https://github.com/bcm-uga/Loter/blob/master/python-package/Local_Ancestry_Example.md accordingly.
You can also use the mode default=False
of lc.loter_local_ancestry
to enable a correction of indecisive SNP (i.e. SNP with a vote lower than rate_vote
), that are corrected based on nearest SNP for which the vote is above rate_vote
. In this mode, you can choose the level of indecision.
Hi,
Thanks for your detailed answers. It is very helpful.
best, Yudong
Hi,
I have some question about bagging process.
res_loter = lc.loter_local_ancestry([H_ref1, H_ref2, H_ref3, H_ref4, H_ref5], H_query)
It seems that res_loter[1] only contained the number of the most frequent vote. Is there any way to get the vote count for each population in the reference list? Because I want to mask some region as ambiguous when the count of the most frequent vote is close to the second vote.best, Yudong