Closed mykmal closed 1 year ago
Thanks for your input! Your changes make a lot of sense and do help clean up that function. Really not sure why I never used the rmv_chrPrefix
parameter. I have run checks locally and it all seems fine to me.- Just two small changes before I merge:
standard_chrs <- c(paste0("chr", 1:22), "X", "Y", "MT")
In your opinion, is that standard to have X rather than chrX - can you point to somewhere this is the standard? If this is the case (which I believe so), I would want to add a line to convert any chrX, chrY and chrMT to X, Y, MT. Could you add this to your else condition on line 42 of your check_chr() updated function
- Checking individual file sizes...
- WARNING: Package files exceed the 5MB size limit.
It appears one file you added to the git (and then likely removed after but is still in history? Is too large. Could you have a look at removing the file causing the issue from git history? I have info on doing that here You may find it cleaner to just clone the repo and put all your changes and resubmit a pull request? Up to you!
Alan.
Thank you for carefully reviewing my (very first) pull request! Below are responses to each of your points.
In your opinion, is that standard to have X rather than chrX - can you point to somewhere this is the standard?
Good catch! Here are the commonly used standards:
> GenomeInfoDb::genomeStyles("Homo sapiens")
circular auto sex NCBI UCSC dbSNP Ensembl
1 FALSE TRUE FALSE 1 chr1 ch1 1
2 FALSE TRUE FALSE 2 chr2 ch2 2
3 FALSE TRUE FALSE 3 chr3 ch3 3
4 FALSE TRUE FALSE 4 chr4 ch4 4
5 FALSE TRUE FALSE 5 chr5 ch5 5
6 FALSE TRUE FALSE 6 chr6 ch6 6
7 FALSE TRUE FALSE 7 chr7 ch7 7
8 FALSE TRUE FALSE 8 chr8 ch8 8
9 FALSE TRUE FALSE 9 chr9 ch9 9
10 FALSE TRUE FALSE 10 chr10 ch10 10
11 FALSE TRUE FALSE 11 chr11 ch11 11
12 FALSE TRUE FALSE 12 chr12 ch12 12
13 FALSE TRUE FALSE 13 chr13 ch13 13
14 FALSE TRUE FALSE 14 chr14 ch14 14
15 FALSE TRUE FALSE 15 chr15 ch15 15
16 FALSE TRUE FALSE 16 chr16 ch16 16
17 FALSE TRUE FALSE 17 chr17 ch17 17
18 FALSE TRUE FALSE 18 chr18 ch18 18
19 FALSE TRUE FALSE 19 chr19 ch19 19
20 FALSE TRUE FALSE 20 chr20 ch20 20
21 FALSE TRUE FALSE 21 chr21 ch21 21
22 FALSE TRUE FALSE 22 chr22 ch22 22
23 FALSE FALSE TRUE X chrX chX X
24 FALSE FALSE TRUE Y chrY chY Y
25 TRUE FALSE FALSE MT chrM chMT MT
Notice that the NCBI and Ensembl styles are identical, and match the format that MSS enforces when rmv_chrPrefix
is TRUE. On the other hand, the UCSC style includes a "chr" prefix and codes mitochondrial DNA as "chrM" (without the T). In light of this, do you think it would make more sense to replace the rmv_chrPrefix
parameter with a chr_style
parameter that lets users choose between the NCBI/Ensembl and UCSC styles? (I've never encountered the dbSNP style in GWAS, but I could also provide that option if you want.)
It appears one file you added to the git (and then likely removed after but is still in history? Is too large.
Weird, I'm not sure what is causing that. Once we decide on how to handle chromosome styles, I'll go ahead and re-commit my changes in a clean branch, as you suggested.
Secondly, this is optional but I think you should add your name to the list of contributors in the README with a link to your github page.
Thanks for inviting me to join the list of contributors! I started actively using MSS this summer and find it very convenient, so this will likely not be my last contribution.
Thank you for carefully reviewing my (very first) pull request!
Thank you very much for contributing! People doing so really takes (at least a little of) the burden of upkeep of the package off me and ensures it remains useful going forward.
In light of this, do you think it would make more sense to replace the rmv_chrPrefix parameter with a chr_style parameter that lets users choose between the NCBI/Ensembl and UCSC styles? (I've never encountered the dbSNP style in GWAS, but I could also provide that option if you want.)
I think adding a chr_style
parameter instead is a great idea - can you set the default to Ensembl/NCBI and add an explanation and example of each style in the documentation? Probably best to also mention that this used to be rmv_chrPrefix
in that parameter documentation too. Then just enforce that the user picks one of these 3 in the validate_parameters
function? rmv_chrPrefix
was actually never passed to this function but it should have been too.
Weird, I'm not sure what is causing that. Once we decide on how to handle chromosome styles, I'll go ahead and re-commit my changes in a clean branch, as you suggested.
Great, thanks!!
Thanks for inviting me to join the list of contributors! I started actively using MSS this summer and find it very convenient, so this will likely not be my last contribution.
Good to hear you find it useful, always happy to add in new features!
Cheers, Alan.
Closing this in order to open a clean PR.
This update makes the following three changes, as also listed in the NEWS.md file:
check_chr()
now automatically removes all SNPs with nonstandard CHR entries (anything other than 1-22, X, Y, and MT)check_chr()
now ensures that the "chr" prefix is lowercase if keptrmv_chrPrefix
parameter is no longer ignored whenrmv_chr
is NULLInitially my plan was to only fix the bug in the third bullet point, but I ended up rewriting the entire
check_chr()
function to improve how the CHR column is standardized. I think that the new function is more efficient and handles some additional edge cases that could trip up the current version of MSS.I tested the updated package on my GWAS data and it appears to work as expected, but I'm happy to explain my code in more detail and/or make any additional changes if needed.