I'm running a bunch of PRS for many datasets and I ran into a probable bug in the coord phase. I noticed that this debug print seemed suspicious:
Coordinating data for chromosome chrom_9
Found 397889 SNPs on chrom 9 that were common across all datasets
Ordering SNPs by genomic positions (based on LD reference genotypes).
Nucleotide concordance counts out of 397889 genotypes, rg-ss: 362903
Nucleotides don't match after all?: g_sid=chr9_42729141_C_T, ss_sid=chr9_42729141_C_T, g_i=217616, ss_i=160113, g_nt=['T', 'C'], ss_nt=['A' 'G']
This is just the head of the output but this happens systematically. As you can see TC is not mapped to AG, although it clearly should be.
Looking at the code it seems that there is a bug in the logical operations with python at line 340 of coord_genotypes.py. After the opposite strand variant is calculated:
Then the script checks if either the original or the opposite strand variant match the variant from the sumstat file:
if not sp.all(g_nt == ss_nt) or sp.all(os_g_nt == ss_nt):
However, the first logical operation seems to be wrong. The not operation only applies to the first boolean evaluation, thus causing the aforementioned output. It should be
if not (sp.all(g_nt == ss_nt) or sp.all(os_g_nt == ss_nt)):
In fact, the following step is built this way and it works fine. First the or boolean operation is defined and then it's used in the if statement:
flip_nts = (g_nt[1] == ss_nt[0] and g_nt[0] == ss_nt[1]) or (os_g_nt[1] == ss_nt[0] and os_g_nt[0] == ss_nt[1])
if flip_nts:
...
I'm running a bunch of PRS for many datasets and I ran into a probable bug in the coord phase. I noticed that this debug print seemed suspicious:
This is just the head of the output but this happens systematically. As you can see
TC
is not mapped toAG
, although it clearly should be.Looking at the code it seems that there is a bug in the logical operations with python at line 340 of
coord_genotypes.py
. After the opposite strand variant is calculated:Then the script checks if either the original or the opposite strand variant match the variant from the sumstat file:
However, the first logical operation seems to be wrong. The
not
operation only applies to the first boolean evaluation, thus causing the aforementioned output. It should beIn fact, the following step is built this way and it works fine. First the
or
boolean operation is defined and then it's used in theif
statement: