Open fgvieira opened 4 years ago
is this still needed? i am very hesitant to add this, but it could be a debug option.
I agree that it would be nice to have as a debug option.
I would also appreciate more detailed per-site info in the output from somalier.
Would it perhaps be possible for the extract
-function, in addition to the .somalier
-files, to output a TSV-file (or something kind of text-format-file) with genomic positions, readcounts, REF, ALT and genotype-calls, that is, some like the following:
chr position nref nalt nother REF ALT GT
chr2 20616424 184 171 1 C T HET
chr4 165697039 0 328 0 G T HOM_ALT
chr4 190318079 290 0 0 C G HOM_REF
chr6 165045333 0 283 0 G T HOM_ALT
...
Hi, you can write this using a simple python script that accepts the sites file and a somalier file (or many somalier files). Here is a function that will read the sites data into a python structure for you: https://github.com/brentp/somalier/blob/master/scripts/ancestry-predict.py#L7
The sites is an array with n_sites rows and 2 columns where first column is ref depth and 2nd is alt depth.
Dear all,
would it be possible to get more detailed per-site info for QC? Right now somalier outputs only per sample and pairs of samples info.
There is already something similar on depthview, but it is very broad and only on HTML. Would it be possible to get that info on a TSV also? Maybe reporting for each site (rows) and each individual (columns) the coverage for each allele as well as somalier's called genotype.
thanks,