Closed arvkevi closed 2 years ago
Merging #143 (d29743f) into develop (8ca5d75) will increase coverage by
0.07%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## develop #143 +/- ##
===========================================
+ Coverage 93.44% 93.52% +0.07%
===========================================
Files 8 8
Lines 1540 1559 +19
Branches 273 274 +1
===========================================
+ Hits 1439 1458 +19
Misses 54 54
Partials 47 47
Impacted Files | Coverage Δ | |
---|---|---|
src/snps/snps.py | 95.94% <100.00%> (+0.14%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 8ca5d75...d29743f. Read the comment docs.
@arvkevi I think we're close with getting the initial tests working. However, pip
is taking a long time to search for compatible packages. I can fix this via a two step install, e.g.:
pip install ezancestry
pip install .
However, that defeats the simplicity of just pip install .[ezancestry]
. Any ideas on how this can be improved?
Thank you for hacking on this PR, Andrew! I cut a release to ezancestry that supports 3.7, which is why I triggered the build yesterday w/ an empty commit. I am confused as to why this is taking so long to resolve dependencies. I'll spend some more time with it.
Hi Kevin, same here. FYI, I tried running the test-extras
job locally via act
, and dependencies were resolved quickly and without any issues...
Hey @arvkevi , turns out pip
couldn't find the correct version of snps
since the tag version history was not available after checkout; 4582b51 fixed it! Pretty close now... looks like some issues with finding ezancestry
data.
I did some more testing with act
and listed the contents of the equivalent of the /home/runner/.ezancestry/data/
directory... It looks like the ezancestry Python code is looking up filenames with a different case to what's actually on the filesystem; e.g., aisnps/Kidd.AISNP.txt
(Python) vs aisnps/KIDD.AISNP.txt
(actual). Same for models/knn.PCA.Kidd.population.bin
and models/knn.PCA.Kidd.superpopulation.bin
.
Hopefully that helps speed the troubleshooting along. 🙂
Thanks, Andrew. I will cut a new release this weekend with a fix for the filenames. I'll also setup my own ci in ezancestry so we don't languish on this branch. Thanks for being so patient with this.
I think I fixed the issue with the new release. The new errors are likely due newly trained models in the release. We can probably just update the assert value.
I think we're good @arvkevi! What are your thoughts on also exposing the raw predictions
dataframe?
@apriha I think that's a good idea. I will put together some documentation with column descriptions.
I'll leave this here and feel free to modify and incorporate wherever you like.
Populations described below are defined here.
'component1', 'component2', 'component3'
:
The coordinates of the sample in the dimensionality-reduced component space. Can be used as (x, y, z,) coordinates for plotting in a 3d scatter plot.
predicted_population_population
:
The max predicted population for the sample.
'ACB', 'ASW', 'BEB', 'CDX', 'CEU', 'CHB', 'CHS', 'CLM', 'ESN', 'FIN', 'GBR', 'GIH', 'GWD', 'IBS', 'ITU', 'JPT', 'KHV', 'LWK', 'MSL', 'MXL', 'PEL', 'PJL', 'PUR', 'STU', 'TSI', 'YRI',
:
Predicted probabilities for each of the populations. These sum to 1.0.
'predicted_population_superpopulation'
:
The max predicted super population (continental) for the sample.
'AFR', 'AMR', 'EAS', 'EUR', 'SAS'
:
Predicted probabilities for each of the super populations. These sum to 1.0.
'population_description', 'superpopulation_name'
Descriptive names of the population and superpopulations.
@arvkevi updates incorporated. Please let me know what you think... If you agree, I think it's ready to merge. Thanks again for developing this awesome capability!
LGTM @apriha, thank you for all your hard work on this PR!
This PR adds basic functionality to predict genetic ancestry using ezancestry. @apriha please feel free to make suggestions/direct edits as you see fit, this is just to get the concept moving forward. Here's how a user could utilize this functionality from
snps
.