Closed Al-Murphy closed 3 years ago
@bschilder I have pushed these changes but do still let me know your thoughts (we can always revert/update if necessary)
No clue how to test how often AF is MAF but if you come across examples with it in the future let me know and we can readdress. With flipping, I think it's an okay assumption as the default and people can always put in their own mapping anyway if they don't agree
You just test how often the effect allele frequency is <.5 (assuming it's biallelic).
So what I could do is add a warning when AF is a column in the sumstats, if any SNP has a AF>0.5, throw the warning to tell the user how many. Also then we can add an input parameter that allows the user to set AF=Major allele frequency rather than letting AF -> FRQ. Does that make sense? Maybe it actually makes sense to do this for the FRQ column in general, I get if non bi-allelic SNPs aren't removed the >.5 rule may not work but it could be good to have it as a warning with a parameter allowing the user to change FRQ column to Major Allele frequency?
I've added these changes (see check_frq_maf()) but it isn't set by default and more so warns the user if it looks like the FRQ values relate to the major rather than the minor allele. I think this is the right way to go but let me know if you have any thoughts
@bschilder, just wanted to note the latest changes to make sure you can't see any issue with them:
VCF often contain an AF column, the mapping currently doesn't convert this to FRQ. I have added this mapping. I think 99% of the time AF will be inferred as MAF but for the times that it isn't, is there an issue with inferring this column as FRQ? Flipping, specifically the
allele_flip_frq
input variable (Default isTRUE
) will flip this (1-AF) if necessary for the SNP anyway.VCFs can have 'AF=...' in the INFO column, currently the 'AF=' is removed but the value is kept as INFO which is wrong. I have updated so if this is the case, the 'AF=' is removed but the column is renamed to AF
Adding a variable FRQ_filter so people can filter based on FRQ/MAF. For example, to keep only SNPs with MAF>0.1. The default will be no filtering though i.e.
FRQ_filter=0