Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

VCF enhancements #58

Closed Al-Murphy closed 3 years ago

Al-Murphy commented 3 years ago

@bschilder, just wanted to note the latest changes to make sure you can't see any issue with them:

Al-Murphy commented 3 years ago

@bschilder I have pushed these changes but do still let me know your thoughts (we can always revert/update if necessary)

bschilder commented 3 years ago
  1. Have you tested how often AF (= (effect/risk) Allele Frequency, presumably) is actually MAF? I'm curious to know how often this is the case. Might be worth testing this across multiple traits to see. But yeah, so long as the SNPs are all biallelic and flippings is turned on, then should be ok either way.
  2. Ah ok, good catch! I'm guessing we originally made some assumptions about the position of the INFO col?
  3. I think this is actually very helpful for a lot of applications, thanks for adding it.
Al-Murphy commented 3 years ago

No clue how to test how often AF is MAF but if you come across examples with it in the future let me know and we can readdress. With flipping, I think it's an okay assumption as the default and people can always put in their own mapping anyway if they don't agree

bschilder commented 3 years ago

You just test how often the effect allele frequency is <.5 (assuming it's biallelic).

Al-Murphy commented 3 years ago

So what I could do is add a warning when AF is a column in the sumstats, if any SNP has a AF>0.5, throw the warning to tell the user how many. Also then we can add an input parameter that allows the user to set AF=Major allele frequency rather than letting AF -> FRQ. Does that make sense? Maybe it actually makes sense to do this for the FRQ column in general, I get if non bi-allelic SNPs aren't removed the >.5 rule may not work but it could be good to have it as a warning with a parameter allowing the user to change FRQ column to Major Allele frequency?

Al-Murphy commented 3 years ago

I've added these changes (see check_frq_maf()) but it isn't set by default and more so warns the user if it looks like the FRQ values relate to the major rather than the minor allele. I think this is the right way to go but let me know if you have any thoughts