broadinstitute / gnomad_qc

BSD 3-Clause "New" or "Revised" License
65 stars 25 forks source link

Compare ancestry inference performance of v4 RF models #161

Closed gtiao closed 1 year ago

gtiao commented 3 years ago

This ticket is aimed at doing the analysis for v4 population assignments (as opposed to writing the PR #494)

gtiao commented 3 years ago

Someone (Alicia? Konrad?) reported getting excellent results training an ancestry RF on 1KG+HGDP and applying to UKBB pan-ancestry project. Let's compare to the gnomAD RF performance and see if we can get better results.

gtiao commented 2 years ago

This really only applies WGS data (v3) instead of exome (v4)

ch-kr commented 2 years ago

we don't have labels for v4 that we can use (we have labels for v3). Mike was planning on looking at whether we should use only use known labels (HGDP/TGP) or v3 imputed labels

mike-w-wilson commented 1 year ago

I moved the wrong ticket here. This will happen later in analysis, not this sprint.

klaricch commented 1 year ago

Decided on:

min_prob_cutoffs={'afr': 0.93, 'ami': 0.96, 'amr': 0.86, 'asj': 0.88, 'eas': 0.96, 'fin': 0.91, 'mid': 0.56, 'nfe': 0.78, 'sas': 0.96}