Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

ES and beta imputation #129

Closed BEFH closed 2 years ago

BEFH commented 2 years ago

1. Bug description

It is unclear in the VCFs and the documentation whether ES must be a beta, but the software treats it as a beta in gwas2vcf and other tools.

There is also a warning against imputing beta, but that may need to be done for any case control GWAS, and calculating beta from Z and SE is preferred over log(OR), but I'm not sure if that is the best choice when OR is present.

Expected behaviour

Because most Case/Control GWAS is released with OR rather than beta, if gwas-vcf requires beta, it makes sense to me for it to be recommended in the documentation to perform the imputation rather than including a warning about inaccuracy, and it should be enabled by default for sumstats including OR but not beta.

The definition of ES in the VCF should also be clarified.

It's possible I'm misreading the code or documentation, or that I'm missing some good rationale for the current behavior, in which case, sorry for the confusion.

Al-Murphy commented 2 years ago

Hey! So yes, as you point out we do take ES = BETA in our mapping file. I understand this may not be strictly what people interpret BETA as so I will update the documentation to say BETA is just an Effect Size of some kind by default. I do want to keep it this way though since I think it will cover most people's use cases but you can update this in the mapping file to suit your needs (I'll add that bit to the documentation too).

On the warning for imputing beta, do you mean here? By default, I would rather not have any imputations set to run in the dataset as this assumes the user wants a field calculated a certain way. I know it is a little annoying but by forcing the user to set the parameter, it will force them to consider which calculation to use and its implications. This approach is also the case for imputing other values like z-score. Do let me know if I'm missing your point here though?

BEFH commented 2 years ago

The problem is that we are integrating this into an automated pipeline. in cases where there's an OR, the log(OR) is almost certainly beta, and BETA is needed for the GWAS-VCF format.

As for the documentation, I am referring to the GWAS-VCF format itself. ES is treated as BETA instead of OR in all of the software, but it's not documented anywhere that it needs to be a BETA-like value and not an OR-type value. For instance, gwas2vcf resolves allele switches by multiplying ES by -1 (rather than 1/ES). If it is supposed to be a more general field, there should be metadata on what it is, and the associated software should detect or allow the specification of what form it takes.

Al-Murphy commented 2 years ago

Not sure if I fully understand, are you asking that if an OR value exists in the sumstats, it should be used to calculate BETA even if another BETA-like value exists (namely ES)? If so I can add a param for this I think so that ES won't be mapped to BETA and then it can be calculated in the hierarchy here: https://github.com/neurogenomics/MungeSumstats/blob/master/R/check_signed_col.R#L18

BEFH commented 2 years ago

No, I think that it should calculate beta from OR by default if BETA/ES does not exist, because AFAIK, that is always a valid operation. Also, shouldn't calculation from OR be the default rather than Z and SE?

The other part is that the documentation and the paper are unclear that ES should not be OR, which is the case based on code.

Al-Murphy commented 2 years ago

I don't agree with it calculating it by default, that is not the approach we took for any other calculated fields so not one I want to have here either. I think having the parameter controlling this is the best option which you can just set to true for all your runs. I agree calculation from OR should be the default rather than Z and SE and will swap this ordering. I will also add a case if LOG_ODDs is present but not OR to calculate beta from this, again only if the parameter is set.

I will also add a param to stop ES being mapped to BETA but by default this won't be set to true. I will update the documentation to call out BETA-like value mapping exists (namely ES). Will this cover your use-case?

I will try to get this pushed today so that it will be added before the code freeze for the next release

Al-Murphy commented 2 years ago

I have pushed a fix which I think should cover your ask here in version 1.5.17. Have a look through and let me know if it doesn't.

Cheers, Alan.

BEFH commented 2 years ago

Thanks Alan. This is a good code fix. Do you have any thoughts on documentation of ES?

Al-Murphy commented 2 years ago

Not sure I know what you meanb? I've added documentation to MSS about it (see the vignette). Do you mean something outside of that? (sorry I feel like I'm not getting what you mean here at all)

BEFH commented 2 years ago

Sorry, I'm not sure how to explain myself. The new changes explain very well how MSS is operating, but the GWAS-VCF format paper and documentation does not specify that ES must be BETA-like, despite pretty much everything treating it as BETA.

I think maybe you've done what you can, other than maybe stating that ES should be BETA like to work with other tools. Maybe I should put a request in with https://github.com/MRCIEU/gwas-vcf-specification?

Al-Murphy commented 2 years ago

Ah okay I understand now, apologes! Yeah I think that putting a request in with them is the best move.

BEFH commented 2 years ago

Thanks, I appreciate you bearing with me.

On Thu, Oct 27, 2022 at 3:40 PM Alan Murphy @.***> wrote:

Ah okay I understand now, apologes! Yeah I think that putting a request in with them is the best move.

— Reply to this email directly, view it on GitHub https://github.com/neurogenomics/MungeSumstats/issues/129#issuecomment-1293982850, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2Z2GDFCI4LN7OPFZTTRDWFLLEPANCNFSM6AAAAAARNKVPDY . You are receiving this because you authored the thread.Message ID: @.***>