Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

IsoCorrectoR: Correction Algorithm For Natural Isotope Abundance Contribution #800

Closed chkohler closed 6 years ago

chkohler commented 6 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 6 years ago

Hi @chkohler

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: IsoCorrectoR
Title: A Correction Algorithm For Natural Isotope Abundance Contributions
Version: 0.0.0.9000
Authors@R: c(person("Christian","Kohler",email="christian.kohler@ur.de",role=c("cre","aut")),
  person("Paul","Heinrich",email="paul.heinrich@ur.de",role=c("aut")))
Imports:
    dplyr,
    magrittr,
    quadprog,
    readr,
    readxl,
    stringr,
    tcltk,
    tibble,
    tcltk2,
    tools,
    utils,
    pracma,
    WriteXLS
Description: IsoCorrectoR is a tool for correcting 
  natural isotope abundance contributions in tracing experiments.
Author: Christian Kohler, Paul Heinrich
Maintainer: Christian Kohler <christian.kohler@ur.de>
Depends: R (>= 3.5)
URL: https://genomics.ur.de/software/IsoCorrectoR
License: GPL-3
LazyData: TRUE
NeedsCompilation: no
biocViews: Software, FunctionalGenomics, MassSpectrometry
RoxygenNote: 6.0.1
Suggests: knitr,
    rmarkdown
VignetteBuilder: knitr
bioc-issue-bot commented 6 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

affe3c3 version bump to 0.99.1

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

ba120a3 version bump to 0.99.2

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

LiNk-NY commented 6 years ago

Hi Christian, @chkohler

Thank you for your submission to Bioconductor. Please see the short review below.

If you have any questions, please post them there in the issue. Best regards, Marcel


IsoCorrectoR #800

/ Pkg root

DESCRIPTION

NAMESPACE

R

As it stands now, I would recommend this package to be submitted to CRAN.

chkohler commented 6 years ago

Hello Marcel @LiNk-NY , Thank you very much for all your valuable comments about our IsoCorrectoR package and the time you spent reviewing our code.

Please let me briefly comment on your remarks:

R

Overall there seems to be a lack of structure to handle the data that you're working with.

Are you referring to the Element/Molecule Files structure? The way the information is provided by these files?

There is no use of existing Bioconductor data structures.

As far as we are aware there is no pre-existing data structure / S4 classes we could re-use/extend for our package. Do you know if there exists re-usable Bioconductor software / data structure that makes IsoCorrectoR Bioconductor compliant?

We're happy that no major issues were observed and will work on your remaining suggestions.

As IsoCorrectoR is meant to address users in the MassSpectrometry / Functional Genomics field, we still believe that the Bioconductor community is the appropriate place to distribute our package. Do you think that there is a way for IsoCorrectoR into the Bioconductor community if all issues are addressed?

Again, thank you very much for your time.

Regards Christian

LiNk-NY commented 6 years ago

Hi Christian, @chkohler

Yes, I was referring to the general non-use of data structures in the package. You may be right that there are no established Bioconductor classes but I'm not the one to say.

I would defer to Laurent Gatto's (@lgatto) knowledge of the subject matter and even refer you to his workshop material here.

Best regards, Marcel

paulheinrich commented 6 years ago

Hello Marcel @LiNk-NY,

thank you for your comment and for pointing us to Laurent Gatto. Would it be possible that he also has a look at the package?

Regarding pre-existing Bioconductor packages/classes: From a first screening of the workshop material of @lgatto , we don't think that we can use tools mentioned there for our package. It deals with mass-spectrometry-based proteomics approaches, which differ quite strongly from the problem IsoCorrectoR was written for (correcting mass spectrometry data from stable isotope tracing metabolomics experiments for natural isotope abundance and impurity of the tracer).

We are aware of the BRAIN package by Piotr Dittwald (http://www.bioconductor.org/packages/release/bioc/html/BRAIN.html) which can calculate Aggregated Isotopic Distributions of molecules. As a part of its normal (nominal) mass resolution correction mode, IsoCorrectoR also calculates Aggregated Isotopic Distributions of the different labeled species of a given molecule originating from tracer isotope incorporation. In its current state, BRAIN is however not an appropriate alternative to our own approach, as it is limited to the elements C, N, O, H and S. In the correction of stable isotope tracing data in metabolomics however, Si is especially important. It is part of most derivatizing agents in GC-MS and has a huge impact on the Aggregated Isotopic Distribution of the derivatized molecules. Also, elements that are relatively rare in biomolecules like Se can be important in metabolomics, e.g. in the case of Selenocysteine. IsoCorrectoRs approach for calculating Aggregated Isotopic Distributions is basically unlimited in the elements it can account for, any element can be supplied to the IsoCorrection() function via the element file that has to be provided by the user (please see our vignette for more information on the element file).

Alongside our submission of the IsoCorrectoR package to Bioconductor, we have also submitted a manuscript on IsoCorrectoR and correction for natural isotope abundance and tracer impurity to Scientific Reports: “Correcting for natural isotope abundance and tracer impurity in MS-, MS/MS- and high-resolution-multiple-tracer-data from stable isotope labeling experiments with IsoCorrectoR” (Heinrich, Kohler et al.). It is currently under review. In the manuscript we analyze how correction affects the data and also compare the correction results of IsoCorrectoR to other available tools, manual calculations and experimentally measured validation mixtures. All comparisons yielded very good agreement. While the other tools are either Perl-, Python- or Matlab- based, IsoCorrectoR is the first R-based tool for performing correction of stable isotope tracing metabolomics data. It is also the only tool that comprises all possible modes/settings of correction (natural isotope abundance, tracer impurity, different tracer isotopes, nominal mass resolution correction in MS and MS/MS mode, high resolution correction with multiple tracer elements used simultaneously) in a single implementation.

As stable isotope tracing comprises a huge section of metabolomics with many potential users, we therefore think that our IsoCorrectoR package could be very beneficial for the Bioconductor community.

Kind regards,

Paul

LiNk-NY commented 6 years ago

Hi Paul, @chkohler

Thank you for your response. Please respond to the issues raised in the review. If there are no established classes, you can continue to use lists.

Don't forget to bump the package version when making changes.

Best regards, Marcel

cc: @lgatto

chkohler commented 6 years ago

Hi Marcel @LiNk-NY,

thank you for your comment. Regarding the other issues:

-The GUI can be a separate package

We will isolate the GUI from the rest of the package. This will result in a separate R package (IsoCorrectorGUI) containing the function IsoCorrectionGUI() and a base-package (IsoCorrectoR) with the IsoCorrection() function that performs the actual correction. It is important to say that IsoCorrectoR can currently already be used indenpendently of IsoCorrectoRGUI by running the correction algorithm on a terminal or within RStudio.

-No need to use nested if statements, use a lookup table:

We have a rather big nested-if structure in the IsoCorrectionGUI.R file (line 143 to 193). Do you mean that? We will check how we can convert that into a lookup-table.

-Avoid the use of 0:n sequence creation:

We have checked in which sections that occurs, and in all cases we found we iterate over some entity that physically ranges from 0 to n. E.g., the number of isotopic label present in a given labeling state (LabelPrecursor) of a molecule that can range from 0 to MaxLabelPrecursor. When MaxLabelPrecursor is reached, all tracer element positions that can potentially be labeled are occupied: for (LabelPrecursor in 0:MaxLabelPrecursor) {…}. We suggest that we comment those sections accordingly for clarification.

-Functions should be more modular so to avoid maintaining one huge function.

We will try to break huge functions into smaller functional units.

-Avoid or provide an option for verbose messages with dates

We will take care of that.

As soon as we’ve implemented changes, we will push them.

Kind regards,

Christian (@chkohler) and Paul (@paulheinrich)

cc: @lgatto

LiNk-NY commented 6 years ago

Hi Christian and Paul, @chkohler @paulheinrich I've contacted Laurent @lgatto about having a look at your package. He said he can get to it this weekend. Thanks so much Laurent!

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

3d94435 version bump to 0.99.3

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

chkohler commented 6 years ago

Hi Marcel @LiNk-NY,

thank you very much for asking @lgatto to have a look at our package! We have now worked on the issues mentioned in our last post and pushed the changes. The latest build on Bioconductors Single Package Build System seems to succeed without any error but an ABNORMAL label was added (which we can not explain right now as there is no package-related error message displayed).

-The GUI can be a separate package

We have separated the GUI from the rest of the package, as announced. Thus, there are now two packages:

As the GUI is now a separate package, we uploaded it as a separate contribution/issue. Will you review the IsoCorrectoRGUI package, too, Marcel @LiNk-NY?

-No need to use nested if statements, use a lookup table:

We have replaced the big nested-if structure in the IsoCorrectionGUI.R file (line 143 to 193) with a lookup-table-like structure. The file is in the IsoCorrectoRGUI package now, however. We also “unnested” some unnecessarily nested ifs in checkMoleculeDataLogic() in util.R.

-Avoid the use of 0:n sequence creation:

Please see our last post.

-Functions should be more modular so to avoid maintaining one huge function.

We have refactored several of our functions and created subfunctions for subtasks to make the code more modular. The total number of functions in the package (when not considering IsoCorrectionGUI.R, which was moved to the IsoCorrectoRGUI package) increased from 28 to 43.

Functions refactored:

By refactoring we e.g. managed to about half the lines of code in our main function IsoCorrection() (773 to 414). As a result, also the overall data handling in the package should be more structured now.

Below, we show the number of lines of code for the different R files of IsoCorrectoR in the previous iteration, which you reviewed, and in the current iteration. Please consider that the lines of code also include comments, which are quite numerous and/or lengthy in some of the functions to be able to properly explain the algorithm. Also, some of the R files like util.R contain multiple functions. Thus, the lines of code in util.R do not correspond to a single function but rather to more than 10.

Lines of code last iteration:

$ find . -name "*.R" | xargs wc -l
    80 ./CalcMeanEnrichment.R
   280 ./CalculateTransitions.R
    18 ./data.R
   287 ./ElementCombinations.R
    72 ./ElementInfoExtraction.R
    32 ./FragmentsCombinedProbability.R
    18 ./get_GUI_status.R
   139 ./IsoCombinations.R
   253 ./IsoCombinationsMaster.R
    21 ./IsoCombinationsProbMassShift.R
   773 ./IsoCorrection.R
   430 ./IsoCorrectionGUI.R
    58 ./MassShiftProbabilities.R
    16 ./MassShiftTransitions.R
   361 ./MoleculeInfoExtraction.R
   237 ./ProbUltraHighRes.R
   168 ./RawDataCorrection.R
    97 ./RawDataExtraction.R
   784 ./util.R
    15 ./zzz.R
  4139 total

Lines of code current iteration:

$ find . -name "*.R" | xargs wc -l
    85 ./CalcMeanEnrichment.R
   149 ./CalculateTransitionsNR.R
   132 ./CalculateTransitionsUHR.R
    18 ./data.R
   107 ./ElementCombinations.R
   110 ./ElementCombinationsNonTracer.R
   108 ./ElementCombinationsTracer.R
    50 ./ElementCombinationsTracerImpurity.R
    72 ./ElementInfoExtraction.R
    81 ./ExtractInputFileInformation.R
    46 ./FragmentsCombinedProbability.R
   128 ./IsoCombinations.R
    93 ./IsoCombinationsMaster.R
    81 ./IsoCombinationsNonTracer.R
    21 ./IsoCombinationsProbMassShift.R
    97 ./IsoCombinationsTracer.R
    57 ./IsoCombinationsTracerImpurity.R
   154 ./IsoCombinationsUHR.R
   414 ./IsoCorrection.R
    70 ./MassShiftProbabilities.R
    29 ./MassShiftTransitions.R
   115 ./MoleculeInfoExtraction.R
   265 ./ParseMoleculeInformation.R
   152 ./ProbNormalRes.R
    42 ./ProbUltraHighRes.R
   179 ./RawDataCorrection.R
    89 ./RawDataCorrectionOnCompleteDataset.R
    97 ./RawDataExtraction.R
    44 ./TracerCountProbabilitiesUHR.R
    56 ./TracersCombinedProbabilityUHR.R
   874 ./util.R
    15 ./zzz.R
  4030 total

-Avoid or provide an option for verbose messages with dates

We added a verbose option to the IsoCorrection() function call. It is set to FALSE by default, which will suppress all messages except one indicating the start and one indicating the end of correction.

Thank you very much for taking the time to review our package @lgatto!

Kind regards,

Christian (@chkohler) and Paul (@paulheinrich)

cc: @lgatto

LiNk-NY commented 6 years ago

Hi Christian and Paul, @chkohler @paulheinrich

Thanks for making those changes. As stated by Martin, we will wait until the current package is fully reviewed and accepted before considering the GUI package. I will have a second look at your package today. Thanks.

Best regards, Marcel

LiNk-NY commented 6 years ago

Hi Christian and Paul, @chkohler @paulheinrich

Thanks for making those changes. They look good.

It is perhaps a side effect of the data class but there is a constant use of for loops (nested or not) and that could possibly be avoided with 'vectorization'.

Overall, some functions seem to have many arguments and may be overwhelming for new users. This could be reduced by using structured classes that already contain the arguments required for some functions. We will let Laurent @lgatto weigh in about relevant classes.

Regards, Marcel

paulheinrich commented 6 years ago

Hi Marcel @LiNk-NY,

thanks a lot for taking the time to take a second look at our package!

Overall, some functions seem to have many arguments and may be overwhelming for new users.

We understand that we have a number of functions that take relatively many arguments. Please consider however, that the only function of the package that is exported and that the user can see/use is IsoCorrection(). All other functions are internal mechanics and hidden from the user.

Here you can see the function call to IsoCorrection():

IsoCorrection(MeasurementFile=NA, ElementFile=NA, MoleculeFile=NA, 
              CorrectTracerImpurity=FALSE, CorrectTracerElementCore=TRUE, 
              CalculateMeanEnrichment=TRUE, UltraHighRes=FALSE, 
              DirOut='.', FileOut='result', FileOutFormat='csv', 
              ReturnResultsObject=FALSE, CorrectAlsoMonoisotopic=FALSE, 
              CalculationThreshold=10^-8, CalculationThreshold_UHR=8, 
              verbose=FALSE, Testmode=FALSE)

The values that are handed to IsoCorrection() by the user are either input file paths or parameter settings. For all of them, it is neccessary that the user can set/change them. For all parameters, default values are present, making the minimally required function call to IsoCorrection() relatively short:

IsoCorrection(MeasurementFile=..., ElementFile=..., MoleculeFile=...)

Kind regards,

Christian (@chkohler) and Paul (@paulheinrich)

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

LiNk-NY commented 6 years ago

Hi Christian @chkohler and Paul @paulheinrich, Thanks for your submission. Laurent @lgatto has his hands full with a new appointment so he won't be able to provide a review. I will accept your package.

Regards, Marcel

bioc-issue-bot commented 6 years ago

Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.

Thank you for contributing to Bioconductor!

chkohler commented 6 years ago

Hi Marcel @LiNk-NY,

thank you very much for accepting our IsoCorrectoR package. Furthermore we'd like to thank you for all your support and the fruitful discussions that lead to significant improvements of the code.

Kind Regards, Paul (@paulheinrich) and Christian

mtmorgan commented 6 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/chkohler.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using biocLite(\"IsoCorrectoR\"). The package 'landing page' will be created at

https://bioconductor.org/packages/IsoCorrectoR

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.