Closed xoelmb closed 2 years ago
Hey,
Thanks for bringing this to our attention. We are looking into it and will get back to you on it soon.
Hi @xoelmb, thanks for bringing this to our attention. I just checked if this was something @Al-Murphy or I had introduced when making updates to EWCE over the last couple years, but it seems that this bit of code existed long before any of those changes were made (at least as far back as May 2018): https://github.com/NathanSkene/EWCE/blob/699ff6dead2970bf40ff5d7b73c212253930ef35/R/check.ewce.genelist.input.r#L87
I agree that this seems to contradict what the documentation says and I can't think of a reason why genes absent from the SCT/CTD wouldn't be dropped in this situation. @NathanSkene do you have any insight into why this was here originally?
Can't think of any reason.... doesn't mean there wasn't one... let's drop it and see what happens?
From: Brian M. Schilder @.> Sent: 21 July 2022 22:21 To: NathanSkene/EWCE @.> Cc: Skene, Nathan G @.>; Mention @.> Subject: Re: [NathanSkene/EWCE] Appropiate background genes for GC & length correction (Issue #71)
This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
Hi @xoelmbhttps://github.com/xoelmb, thanks for bringing this to our attention. I just checked if this was something Alan or I had introduced when making updates to EWCE over the last couple years, but it seems that this was implemented long before any of those changes were made: https://github.com/NathanSkene/EWCE/blob/699ff6dead2970bf40ff5d7b73c212253930ef35/R/check.ewce.genelist.input.r#L87
I agree that this seems to contradict what the documentation says and I can't think of a reason why genes absent from the SCT/CTD wouldn't be dropped in this situation. @NathanSkenehttps://github.com/NathanSkene do you have any insight into why this was here originally?
— Reply to this email directly, view it on GitHubhttps://github.com/NathanSkene/EWCE/issues/71#issuecomment-1191947182, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE4UQ3S4J62VVPXP653VVG5O3ANCNFSM53IAS2XA. You are receiving this because you were mentioned.Message ID: @.***>
Will do @NathanSkene
Come to think of it, I seem to recall @KittyMurphy encountered a similar effect of increased significant results when using geneSizeControl =TRUE
. This likely explains why.
Good point! Thanks for figuring this out
From: Brian M. Schilder @.> Sent: 21 July 2022 22:30 To: NathanSkene/EWCE @.> Cc: Skene, Nathan G @.>; Mention @.> Subject: Re: [NathanSkene/EWCE] Appropiate background genes for GC & length correction (Issue #71)
This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
Will do @NathanSkenehttps://github.com/NathanSkene
Come to think of it, I seem to recall @KittyMurphyhttps://github.com/KittyMurphy encountered a similar effect of increased significant results when using geneSizeControl =TRUE. This likely explains why.
— Reply to this email directly, view it on GitHubhttps://github.com/NathanSkene/EWCE/issues/71#issuecomment-1191953390, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE6ISHT5YHJL26ZBBFTVVG6PHANCNFSM53IAS2XA. You are receiving this because you were mentioned.Message ID: @.***>
Great! Thank you all for spending your time on this! I'll keep an eye on how this gets solved and in the meantime I'll be using the results without geneSizeControl. I'm happy I could make this little contribution to such a good package!
Hey! The fix has been implemented (version 1.5.4) in the Github master branch so feel free to install EWCE from there so you can use it with geneSizeControl. Otherwise it will be up on the bioconductor dev version in the next few days (usually 2-3). You can use that by installing:
BiocManager::install(version='devel')
Thanks for pointing this out!
Great! Thanks to you again!! ☺️
Is your feature request related to a problem? Please describe. I have run the bootstrap enrichment test with EWCE on ~40 genesets and a scRNA-seq dataset with and without geneSizeControl (GC % & gene length control), and with automatic background genes. The number of significant results (q value < 0.05) I get when geneSizeControl is on is huge in comparison. I have been reading the code and I think that the background genes created when geneSizeControl is on include genes not present in the SCT expression dataset.
Describe the solution you'd like I think there is a simple solution to this problem: restricting the background genes used even when geneSizeControl is on to those present in the expression data. So, basically removing one check would make it. Changing this
#### Restrict gene sets to only genes in the SCT dataset ####
if (!geneSizeControl) {
hits <- hits[hits %in% sct_genes]
bg <- bg[bg %in% sct_genes]
}
For this#### Restrict gene sets to only genes in the SCT dataset ####
hits <- hits[hits %in% sct_genes]
bg <- bg[bg %in% sct_genes]
The function is check_ewce_genelist_input https://github.com/NathanSkene/EWCE/blob/9400d4cf6b260f418c89fddcc18bca681c4fbf71/R/check_ewce_genelist_input.r#L129-L133
Describe alternatives you've considered Maybe I'm missing something. Is this filtering done in some other part of the code that I have not reached? Is this expected behavior?
Additional context I think this filtering is meant to be done. According to the original EWCE article: Methods: Bootstrapping with Controls for Transcript Length and GC Content