The previous implementation tried to index a numpy array using a pandas series. This could easily be solved by casting what was needed to a numpy array. However, I think this is a cleaner solution that also avoids the for-loop.
The equivalence between the 's' mode and the 'v' mode can be checked using the following code:
SNPID CHR POS EA NEA EAF BETA SE P N DIRECTION NOTE
1:1066403:T:C 1 1066403 C T 0.5276 0.0043 0.0109 0.6891 191764 ++-+ Clean
1:249250625:T:C 1 249250625 C T 0.5276 0.0043 0.0109 0.6891 191764 ++-+ Out of bound POS
1:1163266:G:A 1 1163266 G A 0.8355 0.0122 0.012 0.3081 191764 -+++ Flip
1:1213224:T:A 1 1213224 T A 0.3345 0.0071 0.0095 0.4528 191764 +++- Flip + Palindromic
1:3066761:A:T 1 3066761 T A 0.1245 -0.0066 0.0157 0.6742 191764 -+++ Palindromic + No flip
1:3997271:T:TTTTA 1 3997271 TTTTA T 0.384 0.0188 0.0135 0.163 191764 ++++ Indel + Both on genome + No flip
1:4007705:TC:T 1 4007705 TC T 0.4368 0.0306 0.0125 0.01421 191764 ++++ Indel + Both on genome + Flip
1:5164281:A:AAGAG 1 5164281 AAGAG A 0.9871 0.0167 0.0454 0.712 191764 -+++ Indel
1:12799025:C:CCT 1 12799025 C CCT 0.173 -0.0186 0.0118 0.115 191764 ---+ Indel + Flip
1:19323015:T:A 1 19323015 T A 0.4389 -0.0165 0.0088 0.05949 191764 ---- Palindromic + Flip + Indistinguishable
1:22841855:A:T 1 22841855 T A 0.4527 0.008 0.0095 0.3998 191764 -++- Palindromic + Indistinguishable
3:183629306:T:TTCTC 3 183629306 TTCTC T 0.0082 -0.0083 0.0555 0.8806 191764 +--+ Indel + Both on genome + No information
4:99731866:G:C 4 99731866 G C 0.3883 0.0052 0.0093 0.5736 191764 +-++ Palindromic+ Flip + Different strand (REF G, TOMMO G 0.38)
8:89935201:C:G 8 89935201 G C 0.3533 0.0103 0.0094 0.2756 191764 ++++ Palindromic + Different strand (REF A; TOMMO G 0.34)
X:4206466:G:C X 4206466 G C 0.8727 -0.0055 0.0104 0.5984 191764 -+-- Palindromic + Flip + No information
X:5053229:A:T X 5053229 T A 0.9747 0.0218 0.0225 0.3318 191764 ++++ Palindromic + No information
X:155270565:A:TT X 155270565 TT A 0.9747 0.0218 0.0225 0.3318 191764 ++++ Out of bound POS
Where there are two variants with POS out of bound.
Improve fix implemented in https://github.com/Cloufield/gwaslab/issues/88.
The previous implementation tried to index a numpy array using a pandas series. This could easily be solved by casting what was needed to a numpy array. However, I think this is a cleaner solution that also avoids the for-loop.
The equivalence between the 's' mode and the 'v' mode can be checked using the following code:
Where the content of
to_harmonize.tsv
is:Where there are two variants with
POS
out of bound.