Closed mcgml closed 5 years ago
changed effect to string
@explodecomputer we need to discuss this
thanks matt could you please send me a path to an example file that has this issue, i'll put something in the calendar to discuss. anybody else need to be involved?
@explodecomputer it's these BCF files on bc4 /mnt/storage/home/ml18692/ukbiobank/vcf_03_19. For example have a look at slurm-1928986.out. The git commit to create the files was dac862e626a2e560a8ee715ab1753dd2469c291d
@mcgml i don't seem to have read access, can you drop them here /mnt/storage/private/mrcieu/research/mr-eve/scratch
please?
I don't have write permission. Here is the relevant data:
slurm-1928986.out:2019-03-19 17:12:08,772 WARNING Effect field smaller than VCF specification. Expect loss of precision for: 5.89127e-07
data.batch_100001.txt.gz: SNP CHR BP GENPOS ALLELE1 ALLELE0 A1FREQ INFO CHISQ_LINREG P_LINREG BETA SE CHISQ_BOLT_LMM_INF P_BOLT_LMM_INF
rs926250 1 9374375 0.191709 G A 0.284393 0.989487 0.000244052 9.9E-01 -5.89127e-07 0.00603322 9.53497e-09 1.0E+00
In this case the effect size is 5.89127e-07 which is below the VCF spec for floats
if all the effects are like this then there is a problem but otherwise, more than like 4 decimal places is not really necessary. i can look into it more, try again with this directory? /mnt/storage/private/mrcieu/research/mr-eve/scratch
Copied across! The vast majority have a least one row like this:
[ml18692@bc4login2 vcf_03_19]$ grep "field smaller than VCF specification" slurm-* | cut -d: -f1 | sort -u | wc -l
80
[ml18692@bc4login2 vcf_03_19]$ ls slurm-* | wc -l
84
But only a few rows genome-wide
Does colocalisation analysis work with imprecise floats?
Thanks I had a look, this looks totally fine. If all the files giving problems are of this ilk then there is no issue with it being rounded to 0. Colocalisation won't be affected by that sort of loss of precision
Great, thanks. I guess it's possible we might encounter the same for SE in the future. Is that also OK? I will switch back to float and round at 0.
if the se is really close to 0 then it would have to be a really massive effect, it should be quite unlikely. but if it is too small then i would opt to round it to the smallest floating point value (1e-6?)
OK thanks will do
@explodecomputer are you happy with this: b596286b6cc206d9b7296b2b919f2020cf3158cd
@mcgml magnificent
@explodecomputer @elswob We already encountered this issue for storing p-values but there are some cases where the effect size is less than 1e-6 in UKBB. They would end up as 0 in the VCF file. Should we log transform or leave as is?