Open eriqande opened 3 months ago
Hi Eric, I found the same issue with some SNP microhap data that I have been working with. It all worked great until I tried to identify close matching samples. I'll let you know what my workaround ends up being if I figure out one.
Hi ariana,
thanks for pinging me about this. Let me know if no simple workarounds work for you and I can fast-track the fix on this for you next week.
Cheers,
eric
On Wed, Sep 4, 2024 at 1:17 PM arianacerreta @.***> wrote:
Hi Eric, I found the same issue with some SNP microhap data that I have been working with. It all worked great until I tried to identify close matching samples. I'll let you know what my workaround ends up being if I figure out one.
— Reply to this email directly, view it on GitHub https://github.com/eriqande/CKMRsim/issues/9#issuecomment-2329797852, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPQ4JWL2T7Z2QLVVFTKDEDZU5MDLAVCNFSM6AAAAABLXFXQMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRZG44TOOBVGI . You are receiving this because you authored the thread.Message ID: @.***>
Hi Eric,
I was following along your tutorial. Since you mentioned that it might have been the LocIdx that was the problem, I modified the reindex_markers function:
reindex_markers<- function(M){
M %>% dplyr::ungroup() %>% dplyr::arrange(Chrom, Pos, desc(Freq)) %>%
#dplyr::group_by(Chrom) %>%
dplyr::mutate(locidx = as.integer(factor(Locus, levels = unique(Locus)))) %>%
dplyr::group_by(Chrom, Locus) %>%
dplyr::mutate(alleidx = as.integer(factor(Allele, levels = unique(Allele))), newfreq = Freq/sum(Freq)) %>%
dplyr::select(-AlleIdx, -LocIdx, -Freq) %>%
rename(Freq = newfreq, AlleIdx = alleidx, LocIdx = locidx) %>%
dplyr::ungroup()
}
That seemed to give unique identifiers for each unique locus, even when I had loci on the same chromosomes.
The find_close_matching_genotypes still threw an error after this, so I double checked the function create_integer_genotype_matrix and it ran with no problem separately. So, I decided instead of calling create_integer_genotype_matrix within find_close_matching_genotypes, I would save the integer matrix separately. Then, I created the matchers object using the source code for the find_close_matching_genotypes function. My work flow was as follows:
mat_GT<-create_integer_genotype_matrix(long_geno_sub,afreqs_ready)
max_mismatch<-5
matchers <- pairwise_geno_id(mat_GT, max_miss = max_mismatch) %>%
dplyr::arrange(num_mismatch) %>% dplyr::mutate(indiv_1 = rownames(S)[ind1],
indiv_2 = rownames(S)[ind2]) %>% dplyr::select(indiv_1,
indiv_2, dplyr::everything())
That is as far as I have gotten so far, but I'm going to keep working through the tutorials you have online with this data. I will let you know if I find anything else.
Ariana
Thanks for the update Ariana. Also, for a better tutorial, that also discusses some of the things that can be done about physical linkage, please check out: https://eriqande.github.io/tws-ckmr-2022/kin-finding-lab.html
Cheers,
eric
Eric here...
I am running some stuff on the Tobique salmon using a ckmr object that includes the chromosomes (it is 12 microsats, with two of them on the same chromosome). When I do that, everything seems to work fine until I start going about doing the pairwise comparisons (actually...I am getting failures looking for close matching samples).
The issue seems to be here:
https://github.com/eriqande/CKMRsim/blob/b54e32473e60e9fbeb92994e0d168b48379da5f6/R/create_integer_genotype_matrix.R#L49
The problem is that LocIdx is reset for each chromosome, so this ends up not being a matrix with unique Loci in it.
For now, the workaround is to create a ckmr object with everything on the same locus, and then use that when doing the pairwise comps. But that is a huge PITA.
I think this can be fixed like this...
I think that I should be able to just modify
reindex_markers()
so that it gives each locus a unique serial index throughout the whole genome, so that the numbers don't start up at 1 again, on each new chromosome. The only thing that would change, I think, would be the names of the loci that are used internally (i.e., thechrom.Locus.pos
nomenclature. But I don't think that this would break anything. In fact I don't think LocIdx plays into that at all, anyway.I need to implement this and test it and make sure it is working.