Closed tstannius closed 4 years ago
this looks good to me. if you'd add a --prefix argument that defaults to something like "somalier-ancestry" use that, and then update the README.md as needed, I think this would be ready.
Done and done.
However, a co-worker suggested some improvements that I will add.
Is there anything that needs changing @brentp or should I consider the edits final? Then I will continue working on extending your MultiQC/Somalier PR to accommodate the ancestry-prediction :-)
thanks for the reminder. FYI, I am working on getting this functionality directly into somalier binary. it's working but lacking a few features it outputs the full text for background and query samples, including the PCs and the confidence for each ancestry. that will probably be a better place to start.
To enable creation of PCA plots in MultiQC, I have modified the script to export csv files containing the predictions and PC's.
These changes affect the way ancestry-predict is called:
ORIGINAL
python scripts/ancestry-predict.py --labels scripts/ancestry-labels-1kg.tsv --samples $MY_SAMPLES/*.somalier --backgrounds 1kg-somalier/*.somalier > sample-ancestries.txt
Outputs:
sample-ancestries.txt
thousandG.npy
NEW
python somalier/scripts/ancestry-predict.py --labels somalier/scripts/ancestry-labels-1kg.tsv --samples $MY_SAMPLES/*.somalier --backgrounds 1kg-somalier/*.somalier
Shows the plot
python3 somalier/scripts/ancestry-predict.py --labels somalier/scripts/ancestry-labels-1kg.tsv --samples data/*.somalier --backgrounds 1kg-somalier/*.somalier --plot mydir/mysample.pdf
Outputs in dir: "mydir":
mysample.ancestry_pcs.csv
mysample.ancestry_prediction.csv
mysample.pdf
mysample.thousandG.npy
Considerations