Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

ncGTW #1154

Closed ChiungTingWu closed 5 years ago

ChiungTingWu commented 5 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 5 years ago

Hi @ChiungTingWu

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: ncGTW
Type: Package
Title: Profile Alignment by Neighbor-wise Compound-specific Graphical Time 
    Warping with Misalignment Detection
Version: 0.8.0
Author: Chiung-Ting Wu <ctwu@vt.edu>
Maintainer: Chiung-Ting Wu <ctwu@vt.edu>
biocViews: Software, MassSpectrometry, Metabolomics, Alignment
Description: The purpose of ncGTW is to help XCMS for LC-MS data alignment. 
    Currently, ncGTW can detect the misaligned feature groups by XCMS, and the
    user can choose to realign these feature groups by ncGTW or not.
License: GPL-2
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1
LinkingTo: 
    Rcpp
Suggests: 
    knitr,
    testthat,
    rmarkdown
VignetteBuilder: knitr
Imports: 
    Rcpp,
    xcms,
    BiocParallel,
    grDevices, 
    graphics, 
    stats
BugReports: https://github.com/ChiungTingWu/ncGTW/issues
bioc-issue-bot commented 5 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

0ab6316 Remove "exit" in ibfs

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

bb440f6 Update examples

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

lshep commented 5 years ago

Thank you for your submission. Please see initial comments below:

README

inst script/

data/man

vignettes 3.3.1

R code loadProfile.R

ncGTWalign.R

ncGTWgraph.R

There is no need to loop over nTps twice. 
distMask <- matrix(0, nTps, nTps)
diagMask <- matrix(0, nTps, nTps)
for (ii in seq_len(nTps))
   diagMask[ii,ii] <- diagonalPenalty
   for (jj in seq_len(nTps))
        distMask[ii, jj] <- ofstPen * (abs(ii - jj) > 0)
- [ ] In getDistMat perhaps instead of if nor==2 elseif nor == 1 else , maybe
use a switch statement for cleaner code
- [ ] Reevalutate nested if/else and condense for example if getDistMat,  db <- d0 if weiP ==
0 regardless of biP so this should be checked outside 
if (biP == TRUE){
    if (weiP == 0){
        dp <- d0
    } else{
        dp <- d0 / ((tstPM * refPM)^weiP)
    }
    if (weiP == -1){
        a0 <- (tstPM + refPM)/2
        g0 <- (tstPM * refPM)^(0.5)
        a1 <- (a0 + g0) / 2
        g1 <- (a0 * g0)^(0.5)
        a2 <- (a1 + g1) / 2
        g2 <- (a1 * g1)^(0.5)
        a3 <- (a2 + g2) / 2
        dp <- d0 / a3
    }
} else if (biP == FALSE){
    if (weiP == 0){
        dp <- d0
    } else if (weiP == -1){
        stop('For weiP = -1, biP should be TRUE!!!')
    } else {
        dp <- d0 / (tstPM ^ weiP)
    }
} else{
    stop('biP should be logical!!!')
}
Can be cleaned up 

stopifnot(is.logical(biP), !is.na(biP)) if (weiP == -1) stopifnot(isTrue(biP))

there may be some syntx errors in here just on the fly not tested

swtich(as.character(weiP), "0" = { dp <- d0 }, "-1" = { a0 <- (tstPM + refPM)/2 g0 <- (tstPM refPM)^(0.5) a1 <- (a0 + g0) / 2 g1 <- (a0 g0)^(0.5) a2 <- (a1 + g1) / 2 g2 <- (a1 * g1)^(0.5) a3 <- (a2 + g2) / 2 dp <- d0 / a3 },

dp <- ifelse(biP, (d0 / ((tstPM * refPM)^weiP)), (d0 / (tstPM ^ weiP))) )



_ncGTWutility.R/ncGTWwarp.R_
- [ ] Same type comments as above.  Try and condense nested for/if/else where possible to clean up code and make it more readable.  Separate out repeated chunks of code to helper functions so functions are more manageable.  Its very hard to get a sense of what your code does in places as currently implemented because of readability.  The for loops and else clauses without brackets make me a little leery that the code breaks at proper points.  I would put it all brackets to be sure. 

_general comment_
- [ ] it might be beneficial to have some argument/parameter checking at the begining of exported function. I've learned to never trust user input despite documentation.      

**Build report**
- [ ] Pleaes glance at the NOTE concerning the included src code. It looks like
a false positive but it would be worth investigating to be sure.

Once the above are considered and updated please do a version bump in the DESCRIPTION to trigger a new build.  Please also comment here with what has been updated and to request a re-review.  
I look forward to getting your package accepted into Bioconductor. 
Cheers 
lshep commented 5 years ago

Just wanted to check back in to make sure you are still working on the package revisions?

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

2d2b1aa Update with the comments from Bioconductor

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

d9ab928 Update user manual and readme, also fix some small...

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

ChiungTingWu commented 5 years ago

I just finished the revisions!

Thank you for your submission. Please see initial comments below:

README

  • [x] Include Bioconductor installation instructions

Included.

inst script/

  • [x] Include source information for files in extdata

Included in "Generation_of_the_example_data.txt"

data/man

  • [x] Give more information about the source of the data.
  • [x] Give more reference to how the object was created if a user was to replicate

Done

vignettes 3.3.1

  • [x] typo? There are two features are detected as

Fixed

R code loadProfile.R

  • [x] use message instead of print
  • [x] Should there be some initial parameter checking to ensure the user passed the correct objects/structure to filePaths and excluGroups?
  • [x] Is it possible to collapse some of the loops over groupNum down into a single loop? Looping can be time consuming and if you could do it once or twice instead of four or five it would be much more efficient.

replaced parameter checking added I combined them into one loop.

ncGTWalign.R

  • [x] I would recommend moving all the parameter checking to a function that would return a valid ncGTWparam rather than check, assign to object, create local object. It makes the code rather messy. I would then just reference the object where needed in the subsequent code.
  • [x] Use message instead of cat.
  • [x] It seems like there is a group of functions run together often? imitGtwParam, buildGTWgraph, graphCut - group these into a helper function and called instead of the repeated triplicate call - Its always to minimize repeated code where possible - it helps for easier maintainence and bug fixing so if you find an error or change a function code only need to be updated in one location rather than several.

a check function is built replaced helper functions are built

ncGTWgraph.R

  • [x] Please include the {} with the else even if there is a stop to ensure proper execution and the brackets with for to ensure proper closeure. Then fix indentation. There are so many for loops and if/elses the code becomes very hard to read and understand why/when different things happen, even a few commented out notes might provide some clarification. if possible condense down loops or create helper functions.
  • [x] Combine these loops. Multiple loops are time consuming. Please apply this throughout code where possible
    distMask <- matrix(0, nTps, nTps)
    for (ii in seq_len(nTps))
        for (jj in seq_len(nTps))
            distMask[ii, jj] <- ofstPen * (abs(ii - jj) > 0)

    diagMask <- matrix(0, nTps, nTps)
    for (ii in seq_len(nTps))
        diagMask[ii,ii] <- diagonalPenalty

There is no need to loop over nTps twice.

    distMask <- matrix(0, nTps, nTps)
    diagMask <- matrix(0, nTps, nTps)
    for (ii in seq_len(nTps))
       diagMask[ii,ii] <- diagonalPenalty
       for (jj in seq_len(nTps))
            distMask[ii, jj] <- ofstPen * (abs(ii - jj) > 0)
  • [x] In getDistMat perhaps instead of if nor==2 elseif nor == 1 else , maybe use a switch statement for cleaner code
  • [x] Reevalutate nested if/else and condense for example if getDistMat, db <- d0 if weiP == 0 regardless of biP so this should be checked outside
    if (biP == TRUE){
        if (weiP == 0){
            dp <- d0
        } else{
            dp <- d0 / ((tstPM * refPM)^weiP)
        }
        if (weiP == -1){
            a0 <- (tstPM + refPM)/2
            g0 <- (tstPM * refPM)^(0.5)
            a1 <- (a0 + g0) / 2
            g1 <- (a0 * g0)^(0.5)
            a2 <- (a1 + g1) / 2
            g2 <- (a1 * g1)^(0.5)
            a3 <- (a2 + g2) / 2
            dp <- d0 / a3
        }
    } else if (biP == FALSE){
        if (weiP == 0){
            dp <- d0
        } else if (weiP == -1){
            stop('For weiP = -1, biP should be TRUE!!!')
        } else {
            dp <- d0 / (tstPM ^ weiP)
        }
    } else{
        stop('biP should be logical!!!')
    }

Can be cleaned up

stopifnot(is.logical(biP), !is.na(biP))
if (weiP == -1) stopifnot(isTrue(biP))
# there may be some syntx errors in here just on the fly not tested
swtich(as.character(weiP),
"0" = { dp <- d0 },
"-1" = {    a0 <- (tstPM + refPM)/2
            g0 <- (tstPM * refPM)^(0.5)
            a1 <- (a0 + g0) / 2
            g1 <- (a0 * g0)^(0.5)
            a2 <- (a1 + g1) / 2
            g2 <- (a1 * g1)^(0.5)
            a3 <- (a2 + g2) / 2
            dp <- d0 / a3
},

dp <- ifelse(biP,  (d0 / ((tstPM * refPM)^weiP)), (d0 / (tstPM ^ weiP)))
)

I tried to add all "{}" back. I tried to combine for loops as many as I could replaced I reevaluated all nested if/else and condensed them as many as I could

ncGTWutility.R/ncGTWwarp.R

  • [x] Same type comments as above. Try and condense nested for/if/else where possible to clean up code and make it more readable. Separate out repeated chunks of code to helper functions so functions are more manageable. Its very hard to get a sense of what your code does in places as currently implemented because of readability. The for loops and else clauses without brackets make me a little leery that the code breaks at proper points. I would put it all brackets to be sure.

done

general comment

  • [x] it might be beneficial to have some argument/parameter checking at the begining of exported function. I've learned to never trust user input despite documentation.

parameter checking is added in each exported function

Build report

  • [x] Pleaes glance at the NOTE concerning the included src code. It looks like a false positive but it would be worth investigating to be sure.

I have checked and I also think it is a false positive.

Once the above are considered and updated please do a version bump in the DESCRIPTION to trigger a new build. Please also comment here with what has been updated and to request a re-review. I look forward to getting your package accepted into Bioconductor. Cheers

Thank you for all the comments!

lshep commented 5 years ago

I apologize for the delayed response. I was away at a conference with limited internet connectivity. Please see the follow up review below:

Questions/Suggestions:

R/man:

loadProfile.R

misalignDetect.R

ncGTWalign.R

ncGTWwarp.R

XCMSedited.R

Thank you for your continued efforts and contributions to Bioconductor. Please respond and make appropriate changes. When ready please do a version bump to ensure a new build and comment back here.

Cheers, ~Lori

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

d923bef Add ncGTWparam class

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

4a9c9a9 Update depends

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

ChiungTingWu commented 5 years ago

Thank you for all the suggestions and comments! Let me reply inline.

I apologize for the delayed response. I was away at a conference with limited internet connectivity. Please see the follow up review below:

Questions/Suggestions:

  • [x] 1. Why did you not create a class structure for the initncGTWparam? It seems like all the values must be present in the list object when inputting to ncGTWalign? Making this also a formalized class would allow for validation if a user tries to create their own list of parameters to pass? I would suggest formalizing this unless you had a reason not too?

Thanks for the suggestion, I created a class "ncGTWparam" with default setting and some validations.

  • [x] 2. It seems slightly odd that there is not a formalized object for the results of loadProfile and ncGTWalign. Its a list of the defined class objects but then there can be no checking that the lists are from the output of the given functions. What was the reasoning behind this? Do you think this is beneficial in any way? See also comments in R/man below for misalignDetect?

I added validation functions to make sure the created objects are valid. I believe sometimes the user may want to design some "simulated" data by themselves, not always loaded from the files. Thus, I think it should be OK the objects are not from the given functions, if they pass the validation functions.

  • [x] 3. Would there ever be a case not to loop over all of the objects in the list for plotGroup, ncGTWalign, and the section of after realignment to send to adjustRt/xcms functions? In other words, should the functions themselves handling the looping internally instead of having the user have to create these? Why would a user not want to loop over all elements or choose a subset of elements to loop over? I'm not as familiar with the data so if this is useful it is fine to keep these as is but was curious use case.

Generally, ncGTW is designed for aligning all input samples as the same time. Thus, it is not possible for ncGTW to align each sample at a time. Also, the alignment results and post-processing are affected by all the input samples, so it is not recommended to use a subset to perform alignment. That is, the user should choose the samples for alignment at the beginning (in the input of loadProfile).

R/man:

loadProfile.R

  • [x] Does excluGroups need to be the result from running misalignDetect? Maybe that should be printed in the stop instead of argument should contain column names?

No, excluGroups does not need to be the result from running misalignDetect. The reason is that sometimes the user may want to choose which part of sample data to be realigned by themselves, so in this way they can pick wherever they like.

misalignDetect.R

  • [x] It seems like the return value is described wrong. Based on this description I would have expected an xcmsSet-class object to be returned with a slot for this data (and then accessible with accessor methods). Instead it seems to be its own matrix object distinctly sepearte from an xcmsSet-class object. This seems intentional? but described wrong? But it did make me think referring to question/suggestion 2 above: Would it be worth considering a class object[s] that extend the xcms::xcmsSet that would keep track of these objects created? So that misalignDetetc, loadProfile, (basically the exported functions) etc... would add additional slots to the xcmsSet object and then always work off said infastrucutre? Ensuring that the same objects and matricies are utilized? So the xcmsSet object would have a defined slot for the matrix of groupings and be consistent with the description in misalignDetect or a slot that is for the list of ncGTWinput-class objects instead of a separate variable/output from the function loadProfile (as one example)? So that everything is tracked in a single object? If you don't think this useful please explain your reasoning? In Bioconductor we utilize this concept of endomorphisms alot where what goes into a function is also returned. It may not be useful here but something to think about as far as organization and consistency and valid objects. It also helps show/utlize the strong connect to xcms which this package seems to directly extended functionality of.

Fixed. Thanks for the suggestion, but there are two reasons make me hesitate to do that. The first one is the sizes of ncGTWinput and ncGTWoutput objects may be very large if there are thousands of samples. It is possible that the user may adjust parameters to compare the alignment result among different ncGTWoutput objects. If I set ncGTWinput and ncGTWoutput as new slots of xcmsSet object, it may have many copies of the same ncGTWinput object, when user just want to check the difference caused by the alignment parameter. The copies of the same ncGTWinput object may consume a lot of memory, so I prefer not to combine them. The second reason is that in the future update, ncGTW needs to accept the data from different alignment-related package, not only xcms, and even not only from r packages (maybe java, python, online platform....). Thus, it seems it is not a good idea to combine them to a class of xcms, if we want to accept more packages in the future.

ncGTWalign.R

  • [x] check wording on parameter check. I believe the variable is ncGTWinput not output here?
  • [x] We don't recommend keeping old commented code in R scripts (esp since they are git tracked so you can always look them back up). This should be applied throuhgout. The comments and explanations in code however are very useful and encouraged - Thank you for this!

Sorry about that, I have fixed it. I have removed all old commented code. Thanks for the suggestion!

ncGTWwarp.R

  • [x] check wording on parameter check. It looks like ncGTWinput is checked twice and the wording refers to ncGTWoutput twice.

Sorry again. Fixed.

XCMSedited.R

  • [x] print/cat should be message
  • [x] I feel like this should be documented officially with a man pages and explicitly stating that uses heavily sampling from the xcms function. This is to be sure to properly cite and give credit to the original xcms function writers. The man page could link to the original function for credit.

Fixed. I have documented it with the link to the original function. Thanks for your reminder!

Thank you for your continued efforts and contributions to Bioconductor. Please respond and make appropriate changes. When ready please do a version bump to ensure a new build and comment back here.

Cheers, ~Lori

Thank you for all the suggestions and comments again. Please let me know if there are something not clear or I can further improve!

Best, CT

lshep commented 5 years ago

Would it be worth considering making wrappers to convert your object to other known Bioconductor objects like the xcmSet after utilizing your pipeline (Not sure could be entirely not appropriate)? I'm not sure what other connected analysis or visualization already exists but this sort of interconnectivity is always what we strive for in Bioconductor.
This is not a limiting comment and the code updates look good. Thank you for the effort and welcome to Bioconductor.

bioc-issue-bot commented 5 years ago

Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.

Thank you for contributing to Bioconductor!

mtmorgan commented 5 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/ChiungTingWu.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("ncGTW"). The package 'landing page' will be created at

https://bioconductor.org/packages/ncGTW

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.