Closed ChiungTingWu closed 5 years ago
Hi @ChiungTingWu
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: ncGTW
Type: Package
Title: Profile Alignment by Neighbor-wise Compound-specific Graphical Time
Warping with Misalignment Detection
Version: 0.8.0
Author: Chiung-Ting Wu <ctwu@vt.edu>
Maintainer: Chiung-Ting Wu <ctwu@vt.edu>
biocViews: Software, MassSpectrometry, Metabolomics, Alignment
Description: The purpose of ncGTW is to help XCMS for LC-MS data alignment.
Currently, ncGTW can detect the misaligned feature groups by XCMS, and the
user can choose to realign these feature groups by ncGTW or not.
License: GPL-2
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1
LinkingTo:
Rcpp
Suggests:
knitr,
testthat,
rmarkdown
VignetteBuilder: knitr
Imports:
Rcpp,
xcms,
BiocParallel,
grDevices,
graphics,
stats
BugReports: https://github.com/ChiungTingWu/ncGTW/issues
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
0ab6316 Remove "exit" in ibfs
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
bb440f6 Update examples
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Thank you for your submission. Please see initial comments below:
README
inst script/
data/man
vignettes 3.3.1
There are two features are detected as
R code loadProfile.R
ncGTWalign.R
ncGTWgraph.R
{}
with the else even if there is a stop to ensure
proper execution and the brackets with for
to ensure proper closeure. Then fix
indentation. There are so many for loops and if/elses the code becomes very
hard to read and understand why/when different things happen, even a few
commented out notes might provide some clarification. if possible condense down loops
or create helper functions. [ ] Combine these loops. Multiple loops are time consuming. Please apply this throughout code where possible
distMask <- matrix(0, nTps, nTps)
for (ii in seq_len(nTps))
for (jj in seq_len(nTps))
distMask[ii, jj] <- ofstPen * (abs(ii - jj) > 0)
diagMask <- matrix(0, nTps, nTps)
for (ii in seq_len(nTps))
diagMask[ii,ii] <- diagonalPenalty
There is no need to loop over nTps twice.
distMask <- matrix(0, nTps, nTps)
diagMask <- matrix(0, nTps, nTps)
for (ii in seq_len(nTps))
diagMask[ii,ii] <- diagonalPenalty
for (jj in seq_len(nTps))
distMask[ii, jj] <- ofstPen * (abs(ii - jj) > 0)
- [ ] In getDistMat perhaps instead of if nor==2 elseif nor == 1 else , maybe
use a switch statement for cleaner code
- [ ] Reevalutate nested if/else and condense for example if getDistMat, db <- d0 if weiP ==
0 regardless of biP so this should be checked outside
if (biP == TRUE){
if (weiP == 0){
dp <- d0
} else{
dp <- d0 / ((tstPM * refPM)^weiP)
}
if (weiP == -1){
a0 <- (tstPM + refPM)/2
g0 <- (tstPM * refPM)^(0.5)
a1 <- (a0 + g0) / 2
g1 <- (a0 * g0)^(0.5)
a2 <- (a1 + g1) / 2
g2 <- (a1 * g1)^(0.5)
a3 <- (a2 + g2) / 2
dp <- d0 / a3
}
} else if (biP == FALSE){
if (weiP == 0){
dp <- d0
} else if (weiP == -1){
stop('For weiP = -1, biP should be TRUE!!!')
} else {
dp <- d0 / (tstPM ^ weiP)
}
} else{
stop('biP should be logical!!!')
}
Can be cleaned up
stopifnot(is.logical(biP), !is.na(biP)) if (weiP == -1) stopifnot(isTrue(biP))
swtich(as.character(weiP), "0" = { dp <- d0 }, "-1" = { a0 <- (tstPM + refPM)/2 g0 <- (tstPM refPM)^(0.5) a1 <- (a0 + g0) / 2 g1 <- (a0 g0)^(0.5) a2 <- (a1 + g1) / 2 g2 <- (a1 * g1)^(0.5) a3 <- (a2 + g2) / 2 dp <- d0 / a3 },
dp <- ifelse(biP, (d0 / ((tstPM * refPM)^weiP)), (d0 / (tstPM ^ weiP))) )
_ncGTWutility.R/ncGTWwarp.R_
- [ ] Same type comments as above. Try and condense nested for/if/else where possible to clean up code and make it more readable. Separate out repeated chunks of code to helper functions so functions are more manageable. Its very hard to get a sense of what your code does in places as currently implemented because of readability. The for loops and else clauses without brackets make me a little leery that the code breaks at proper points. I would put it all brackets to be sure.
_general comment_
- [ ] it might be beneficial to have some argument/parameter checking at the begining of exported function. I've learned to never trust user input despite documentation.
**Build report**
- [ ] Pleaes glance at the NOTE concerning the included src code. It looks like
a false positive but it would be worth investigating to be sure.
Once the above are considered and updated please do a version bump in the DESCRIPTION to trigger a new build. Please also comment here with what has been updated and to request a re-review.
I look forward to getting your package accepted into Bioconductor.
Cheers
Just wanted to check back in to make sure you are still working on the package revisions?
Received a valid push; starting a build. Commits are:
2d2b1aa Update with the comments from Bioconductor
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
d9ab928 Update user manual and readme, also fix some small...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
I just finished the revisions!
Thank you for your submission. Please see initial comments below:
README
- [x] Include Bioconductor installation instructions
Included.
inst script/
- [x] Include source information for files in extdata
Included in "Generation_of_the_example_data.txt"
data/man
- [x] Give more information about the source of the data.
- [x] Give more reference to how the object was created if a user was to replicate
Done
vignettes 3.3.1
- [x] typo?
There are two features are detected as
Fixed
R code loadProfile.R
- [x] use message instead of print
- [x] Should there be some initial parameter checking to ensure the user passed the correct objects/structure to filePaths and excluGroups?
- [x] Is it possible to collapse some of the loops over groupNum down into a single loop? Looping can be time consuming and if you could do it once or twice instead of four or five it would be much more efficient.
replaced parameter checking added I combined them into one loop.
ncGTWalign.R
- [x] I would recommend moving all the parameter checking to a function that would return a valid ncGTWparam rather than check, assign to object, create local object. It makes the code rather messy. I would then just reference the object where needed in the subsequent code.
- [x] Use message instead of cat.
- [x] It seems like there is a group of functions run together often? imitGtwParam, buildGTWgraph, graphCut - group these into a helper function and called instead of the repeated triplicate call - Its always to minimize repeated code where possible - it helps for easier maintainence and bug fixing so if you find an error or change a function code only need to be updated in one location rather than several.
a check function is built replaced helper functions are built
ncGTWgraph.R
- [x] Please include the
{}
with the else even if there is a stop to ensure proper execution and the brackets withfor
to ensure proper closeure. Then fix indentation. There are so many for loops and if/elses the code becomes very hard to read and understand why/when different things happen, even a few commented out notes might provide some clarification. if possible condense down loops or create helper functions.- [x] Combine these loops. Multiple loops are time consuming. Please apply this throughout code where possible
distMask <- matrix(0, nTps, nTps) for (ii in seq_len(nTps)) for (jj in seq_len(nTps)) distMask[ii, jj] <- ofstPen * (abs(ii - jj) > 0) diagMask <- matrix(0, nTps, nTps) for (ii in seq_len(nTps)) diagMask[ii,ii] <- diagonalPenalty
There is no need to loop over nTps twice.
distMask <- matrix(0, nTps, nTps) diagMask <- matrix(0, nTps, nTps) for (ii in seq_len(nTps)) diagMask[ii,ii] <- diagonalPenalty for (jj in seq_len(nTps)) distMask[ii, jj] <- ofstPen * (abs(ii - jj) > 0)
- [x] In getDistMat perhaps instead of if nor==2 elseif nor == 1 else , maybe use a switch statement for cleaner code
- [x] Reevalutate nested if/else and condense for example if getDistMat, db <- d0 if weiP == 0 regardless of biP so this should be checked outside
if (biP == TRUE){ if (weiP == 0){ dp <- d0 } else{ dp <- d0 / ((tstPM * refPM)^weiP) } if (weiP == -1){ a0 <- (tstPM + refPM)/2 g0 <- (tstPM * refPM)^(0.5) a1 <- (a0 + g0) / 2 g1 <- (a0 * g0)^(0.5) a2 <- (a1 + g1) / 2 g2 <- (a1 * g1)^(0.5) a3 <- (a2 + g2) / 2 dp <- d0 / a3 } } else if (biP == FALSE){ if (weiP == 0){ dp <- d0 } else if (weiP == -1){ stop('For weiP = -1, biP should be TRUE!!!') } else { dp <- d0 / (tstPM ^ weiP) } } else{ stop('biP should be logical!!!') }
Can be cleaned up
stopifnot(is.logical(biP), !is.na(biP)) if (weiP == -1) stopifnot(isTrue(biP)) # there may be some syntx errors in here just on the fly not tested swtich(as.character(weiP), "0" = { dp <- d0 }, "-1" = { a0 <- (tstPM + refPM)/2 g0 <- (tstPM * refPM)^(0.5) a1 <- (a0 + g0) / 2 g1 <- (a0 * g0)^(0.5) a2 <- (a1 + g1) / 2 g2 <- (a1 * g1)^(0.5) a3 <- (a2 + g2) / 2 dp <- d0 / a3 }, dp <- ifelse(biP, (d0 / ((tstPM * refPM)^weiP)), (d0 / (tstPM ^ weiP))) )
I tried to add all "{}" back. I tried to combine for loops as many as I could replaced I reevaluated all nested if/else and condensed them as many as I could
ncGTWutility.R/ncGTWwarp.R
- [x] Same type comments as above. Try and condense nested for/if/else where possible to clean up code and make it more readable. Separate out repeated chunks of code to helper functions so functions are more manageable. Its very hard to get a sense of what your code does in places as currently implemented because of readability. The for loops and else clauses without brackets make me a little leery that the code breaks at proper points. I would put it all brackets to be sure.
done
general comment
- [x] it might be beneficial to have some argument/parameter checking at the begining of exported function. I've learned to never trust user input despite documentation.
parameter checking is added in each exported function
Build report
- [x] Pleaes glance at the NOTE concerning the included src code. It looks like a false positive but it would be worth investigating to be sure.
I have checked and I also think it is a false positive.
Once the above are considered and updated please do a version bump in the DESCRIPTION to trigger a new build. Please also comment here with what has been updated and to request a re-review. I look forward to getting your package accepted into Bioconductor. Cheers
Thank you for all the comments!
I apologize for the delayed response. I was away at a conference with limited internet connectivity. Please see the follow up review below:
Questions/Suggestions:
[ ] 1. Why did you not create a class structure for the initncGTWparam? It seems like all the values must be present in the list object when inputting to ncGTWalign? Making this also a formalized class would allow for validation if a user tries to create their own list of parameters to pass? I would suggest formalizing this unless you had a reason not too?
[ ] 2. It seems slightly odd that there is not a formalized object for the results of loadProfile and ncGTWalign. Its a list of the defined class objects but then there can be no checking that the lists are from the output of the given functions. What was the reasoning behind this? Do you think this is beneficial in any way? See also comments in R/man below for misalignDetect?
[ ] 3. Would there ever be a case not to loop over all of the objects in the list for plotGroup, ncGTWalign, and the section of after realignment to send to adjustRt/xcms functions? In other words, should the functions themselves handling the looping internally instead of having the user have to create these? Why would a user not want to loop over all elements or choose a subset of elements to loop over? I'm not as familiar with the data so if this is useful it is fine to keep these as is but was curious use case.
R/man:
loadProfile.R
misalignDetect.R
[ ] It seems like the return value is described wrong. Based on this description I would have expected an xcmsSet-class object to be returned with a slot for this data (and then accessible with accessor methods). Instead it seems to be its own matrix object distinctly sepearte from an xcmsSet-class object. This seems intentional? but described wrong?
But it did make me think referring to question/suggestion 2 above: Would it be worth considering a class object[s] that extend the xcms::xcmsSet that would keep track of these objects created? So that misalignDetetc, loadProfile, (basically the exported functions) etc... would add additional slots to the xcmsSet object and then always work off said infastrucutre? Ensuring that the same objects and matricies are utilized? So the xcmsSet object would have a defined slot for the matrix of groupings and be consistent with the description in misalignDetect or a slot that is for the list of ncGTWinput-class objects instead of a separate variable/output from the function loadProfile (as one example)? So that everything is tracked in a single object? If you don't think this useful please explain your reasoning? In Bioconductor we utilize this concept of endomorphisms alot where what goes into a function is also returned. It may not be useful here but something to think about as far as organization and consistency and valid objects. It also helps show/utlize the strong connect to xcms which this package seems to directly extended functionality of.
ncGTWalign.R
ncGTWwarp.R
XCMSedited.R
Thank you for your continued efforts and contributions to Bioconductor. Please respond and make appropriate changes. When ready please do a version bump to ensure a new build and comment back here.
Cheers, ~Lori
Received a valid push; starting a build. Commits are:
d923bef Add ncGTWparam class
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
4a9c9a9 Update depends
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Thank you for all the suggestions and comments! Let me reply inline.
I apologize for the delayed response. I was away at a conference with limited internet connectivity. Please see the follow up review below:
Questions/Suggestions:
- [x] 1. Why did you not create a class structure for the initncGTWparam? It seems like all the values must be present in the list object when inputting to ncGTWalign? Making this also a formalized class would allow for validation if a user tries to create their own list of parameters to pass? I would suggest formalizing this unless you had a reason not too?
Thanks for the suggestion, I created a class "ncGTWparam" with default setting and some validations.
- [x] 2. It seems slightly odd that there is not a formalized object for the results of loadProfile and ncGTWalign. Its a list of the defined class objects but then there can be no checking that the lists are from the output of the given functions. What was the reasoning behind this? Do you think this is beneficial in any way? See also comments in R/man below for misalignDetect?
I added validation functions to make sure the created objects are valid. I believe sometimes the user may want to design some "simulated" data by themselves, not always loaded from the files. Thus, I think it should be OK the objects are not from the given functions, if they pass the validation functions.
- [x] 3. Would there ever be a case not to loop over all of the objects in the list for plotGroup, ncGTWalign, and the section of after realignment to send to adjustRt/xcms functions? In other words, should the functions themselves handling the looping internally instead of having the user have to create these? Why would a user not want to loop over all elements or choose a subset of elements to loop over? I'm not as familiar with the data so if this is useful it is fine to keep these as is but was curious use case.
Generally, ncGTW is designed for aligning all input samples as the same time. Thus, it is not possible for ncGTW to align each sample at a time. Also, the alignment results and post-processing are affected by all the input samples, so it is not recommended to use a subset to perform alignment. That is, the user should choose the samples for alignment at the beginning (in the input of loadProfile).
R/man:
loadProfile.R
- [x] Does excluGroups need to be the result from running misalignDetect? Maybe that should be printed in the stop instead of argument should contain column names?
No, excluGroups does not need to be the result from running misalignDetect. The reason is that sometimes the user may want to choose which part of sample data to be realigned by themselves, so in this way they can pick wherever they like.
misalignDetect.R
- [x] It seems like the return value is described wrong. Based on this description I would have expected an xcmsSet-class object to be returned with a slot for this data (and then accessible with accessor methods). Instead it seems to be its own matrix object distinctly sepearte from an xcmsSet-class object. This seems intentional? but described wrong? But it did make me think referring to question/suggestion 2 above: Would it be worth considering a class object[s] that extend the xcms::xcmsSet that would keep track of these objects created? So that misalignDetetc, loadProfile, (basically the exported functions) etc... would add additional slots to the xcmsSet object and then always work off said infastrucutre? Ensuring that the same objects and matricies are utilized? So the xcmsSet object would have a defined slot for the matrix of groupings and be consistent with the description in misalignDetect or a slot that is for the list of ncGTWinput-class objects instead of a separate variable/output from the function loadProfile (as one example)? So that everything is tracked in a single object? If you don't think this useful please explain your reasoning? In Bioconductor we utilize this concept of endomorphisms alot where what goes into a function is also returned. It may not be useful here but something to think about as far as organization and consistency and valid objects. It also helps show/utlize the strong connect to xcms which this package seems to directly extended functionality of.
Fixed. Thanks for the suggestion, but there are two reasons make me hesitate to do that. The first one is the sizes of ncGTWinput and ncGTWoutput objects may be very large if there are thousands of samples. It is possible that the user may adjust parameters to compare the alignment result among different ncGTWoutput objects. If I set ncGTWinput and ncGTWoutput as new slots of xcmsSet object, it may have many copies of the same ncGTWinput object, when user just want to check the difference caused by the alignment parameter. The copies of the same ncGTWinput object may consume a lot of memory, so I prefer not to combine them. The second reason is that in the future update, ncGTW needs to accept the data from different alignment-related package, not only xcms, and even not only from r packages (maybe java, python, online platform....). Thus, it seems it is not a good idea to combine them to a class of xcms, if we want to accept more packages in the future.
ncGTWalign.R
- [x] check wording on parameter check. I believe the variable is ncGTWinput not output here?
- [x] We don't recommend keeping old commented code in R scripts (esp since they are git tracked so you can always look them back up). This should be applied throuhgout. The comments and explanations in code however are very useful and encouraged - Thank you for this!
Sorry about that, I have fixed it. I have removed all old commented code. Thanks for the suggestion!
ncGTWwarp.R
- [x] check wording on parameter check. It looks like ncGTWinput is checked twice and the wording refers to ncGTWoutput twice.
Sorry again. Fixed.
XCMSedited.R
- [x] print/cat should be message
- [x] I feel like this should be documented officially with a man pages and explicitly stating that uses heavily sampling from the xcms function. This is to be sure to properly cite and give credit to the original xcms function writers. The man page could link to the original function for credit.
Fixed. I have documented it with the link to the original function. Thanks for your reminder!
Thank you for your continued efforts and contributions to Bioconductor. Please respond and make appropriate changes. When ready please do a version bump to ensure a new build and comment back here.
Cheers, ~Lori
Thank you for all the suggestions and comments again. Please let me know if there are something not clear or I can further improve!
Best, CT
Would it be worth considering making wrappers to convert your object to other known Bioconductor objects like the xcmSet after utilizing your pipeline (Not sure could be entirely not appropriate)? I'm not sure what other connected analysis or visualization already exists but this sort of interconnectivity is always what we strive for in Bioconductor.
This is not a limiting comment and the code updates look good. Thank you for the effort and welcome to Bioconductor.
Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.
Thank you for contributing to Bioconductor!
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/ChiungTingWu.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("ncGTW")
. The package 'landing page' will be created at
https://bioconductor.org/packages/ncGTW
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.