Bioconductor / Contributions

Contribute Packages to Bioconductor
133 stars 33 forks source link

lipidr #1065

Closed ahmohamed closed 5 years ago

ahmohamed commented 5 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 5 years ago

Hi @ahmohamed

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: lipidr
Title: Lipidomics Analysis Workflow in R
Version: 0.99.0
Authors@R: c(
    person(
        "Ahmed", "Mohamed", 
        email = "mohamed@kuicr.kyoto-u.ac.jp", role = "cre",
        comment = c(ORCID = "0000-0001-6507-5300")
    ),
    person(
        "Ahmed", "Mohamed", 
        email = "ahmed.mohamed@qimrberghofer.edu.au", role = "aut"
    ),
    person(
        "Jeffrey", "Molendijk", 
        email = "Jeffrey.Molendijk@qimrberghofer.edu.au", role = "aut"
    ))
Description: lipidr an easy-to-use R package implementing a complete workflow 
    for downstream analysis of lipidomics data. lipidr parses results exported 
    from Skyline directly into R, allowing integration into current analysis 
    frameworks. lipidr allows data inspection, normalization, univariate and
    multivariate analysis, displaying informative visualizations. lipidr also
    implements a novel Lipid Set Enrichment Analysis (LSEA), harnessing 
    molecular information such as lipid class, chain length and unsaturation.    
Depends: R (>= 3.6.0)
Imports:
    methods,
    stats,
    S4Vectors,
    SummarizedExperiment,
    rlang,
    dplyr,
    tidyr,
    forcats,
    ggplot2,
    limma,
    fgsea,
    ropls,
    magrittr
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Suggests: 
    knitr,
    rmarkdown,
    BiocStyle,
    ggrepel,
    plotly,
    covr,
    spelling
VignetteBuilder: knitr
biocViews: Lipidomics, MassSpectrometry, Normalization, QualityControl,
    Visualization
Language: en-US
RoxygenNote: 6.1.1
Roxygen: list(markdown = TRUE)
bioc-issue-bot commented 5 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

ahmohamed commented 5 years ago

Just clarifying the errors on the build,

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

b5a1737 removing .gitattributes 587b2f8 testing allow_failures in appveyor 23e9d24 install ggplot2 from github on appveyor a27480f appveyor: disable bioc_devel on r-release 6cbca82 fix link to magrittr::pipe 10cc4da version pump ab9677e Merge branch 'release/0.99.0'

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

36c9eef lightweight exmaples for plot_mol_boxplot & loadin... 06387f1 Merge branch 'release/0.99.0'

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

79569ed Merging lsea examples 9c4b855 fix param documentation c60f130 version pump 3109adf Merge branch 'release/0.99.0'

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

f665337 more example reduction, version pump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

ahmohamed commented 5 years ago

@Liubuntu Dear Qian, Warnings now are only because of Check time > 5m. I have tried to address the warnings by reducing the examples as much as possible. However, currently warnings on Windows & OSX because of multi-arc testing. Also, timings vary from one check to the other. I am happy to accommodate any suggestions.

Thanks. Ahmed.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

87760f8 version pump again for bioconductor-bot

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

ahmohamed commented 5 years ago

@Liubuntu Dear Qian, Managed to cut down check time by reducing runnable code in vignette. Also, addressed all NOTES in R CMD check.

Can you kindly provide an estimate for when might the package be reviewed? This is the first package specialized in lipidomics analysis, and we are really keen to have it included in Bioc 3.9.

Thank you for your time. Ahmed.

ahmohamed commented 5 years ago

Hi @Liubuntu, The deadline for Bioc 3.9 is approaching quickly. Would you be able to provide an initial review soon, so I can have time to fix the issues? Thanks again. Ahmed.

Liubuntu commented 5 years ago

Hi @ahmohamed ,

Thanks for submitting your package. Your have submitted it before the deadline of April 5th, so we'll make all efforts to have it reviewed and hopefully accepted before the deadline of release (April 24th) based on instant updates. Time would be limited but we are targeting for inclusion.

We're actually receiving much more packages before the release, which slows down the review process. We review packages based on the order of submission. Based on normal flow, I give reviews within 2 weeks of your last successful build without error or warning (so hopefully before April 16th, next Tuesday). The initial review would take the most time but follow-up's would be much quicker. So we have good chance.

Best, Qian

Liubuntu commented 5 years ago

Hi @ahmohamed

Please see the following for the initial review. The biggest issue would be the R/DataStructure section. All other issues would be based on this. See below and let me know for any question or concerns so we can discuss.

Best, Qian

DESCRIPTION

NAMESPACE

NEWS

data & data-raw

unit test

R/

Data Structure

Coding

plot.R:

General:

SUGGESTION

Liubuntu commented 5 years ago

Continue:

man/

' @description data_normalized: A dataset ...

' @rdname lipidr-data

' @format ...

' @examples data(data_normalized)

data_normalized



## vignette

- Add a Table of contents for a cleaner hierarchy. Refer [here](https://bookdown.org/yihui/rmarkdown/html-document.html#table-of-contents) for more details.
- Suggestion: add more details about author, e.g., affiliation.
- Suggestion: Re-organize your vignette and use sub-headers to make a
  better hierarchy. e.g., Maybe use the section of `Visualization` to
  include all plot functions? Or keep the current structure
  (analysis + visualization) and rename the "Visualization of raw
  data" into "raw data quality check"?
- Add a section for `sessionInfo()` in the end of the vignette.
ahmohamed commented 5 years ago

@Liubuntu Thanks a lot for your review. I will have a closer look at the comments, but they should be fixed relatively easily.

Regarding the data structure, I am aware of the metadata field, and have thought about adding the attributes to it. The main problem is that I would have to check to if certain attributes are defined. In particular, dimnames which is used to_long_format function. This is not arbitrary (as the description of metadata field) in some sense, but rather a core attribute of the class. I have opted to put it in attrs to avoid conflicts with how SummarizedExperiment defines it.

Let me know what you think. Best. Ahmed.

ahmohamed commented 5 years ago

@Liubuntu After much thought into the new class issue, I feel I am more inclined to define a new class for the following reasons:

From a data structure point of view, I will follow your advice to remove the attrs slot and put the arbitrary annotations in metadata, which will make the structure identical to SummarizedExperiment, but the class definition will be kept for the considerations above.

As I said in my previous comment, the other way is to use SummarizedExperiment and check for the attributes/annotations needed. This is somewhat close to duck typing, which I believe S4 objects should avoid. I wonder if this is correct? May be @mtmorgan can help?

Looking forward to your opinion. Thanks. Ahmed.

mtmorgan commented 5 years ago

It sounds like what you want to do is a simple extension of SummarizedExperiment, something like .SkylineExperiment = setClass("SkylineExperiment", contains = "SummarizedExperiment", slots = c(...any additional slots, if any...)) and this sounds like a very good way to go.

ahmohamed commented 5 years ago

Thanks @mtmorgan. This is exactly the way it is now. Hopefully @Liubuntu agrees.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

202e443 version pump again for bioc-bot

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

b823b9e revert dplyr import to see if fixes `multiple meth...

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

ahmohamed commented 5 years ago

Hi @ahmohamed

Please see the following for the initial review. The biggest issue would be the R/DataStructure section. All other issues would be based on this. See below and let me know for any question or concerns so we can discuss.

Best, Qian

DESCRIPTION

  • Looks good.

NAMESPACE

  • Consider importing the whole package, e.g., the dplyr and stats here. For conflicting functions, use for example @rawNamespace import(Utils, except=c(“head”, “tail”)). See here for more details.

I imported dplyr but it seems to have cause a weird warning on celaya2 multiple methods tables found for "rowSums/colSums/rowMeans/colMeans" here. I reverted the import to @importFrom and the warnings were removed. However, from these posts here & here, it might have been just related to the build system.

As for stats, I am only using few functions (9 functions exactly). stats is a big package, and importing all the namespace unnecessarily may cause conflicts later. Following this advice, I am sticking to specific imports for stats.

NEWS

  • OK.

data & data-raw

  • OK.

unit test

  • SUGGESTION: it's strongly recommended to include more unit tests for your functions.

Unit tests are on our todo list. In the interest of catching up with the release deadline, I will postpone adding them for now.

R/

Data Structure

  • IMPORTANT: Seems that you can just use the metadata slot in SummarizedExperiment to save the additional attributes. ?SummarizedExperiment will give your the help page showing metadata: An optional ‘list’ of arbitrary content describing the overall experiment. So that you do not need to define a new class of SkylineExperiment with the additional slot of attrs.

As mentioned in my previous comment, I moved all the attrs to metadata and removed the @attrs slot from SkylineExperiment. I am still keeping the definition of SkylineExperiment for the considerations above. Let me know if you have other suggestions regarding this issue.

Coding

  • Modify your whole script with the above issue.

Discussed above.

  • mva.R L59:L130. Use if()else() to avoid duplicate checking.
  • avoid double checking of if(logical==TRUE), use if(logical) or if(!logical) instead. See all cases here:
$ grep "== TRUE" R/*
R/mva.R:  if (ellipse == TRUE) {
R/mva.R:  if (hotelling == TRUE) {
R/plot.R:  if (log == TRUE) {
R/plot.R:  if (log == TRUE) {
R/plot.R:  if (log == TRUE) {
R/plot.R:  if (log == TRUE) {
R/plot.R:  if (log == TRUE) {
R/plot.R:  if (log == TRUE) {
R/plot.R:  if (log == TRUE) {

Both issues fixed.

  • mva.R: L385. if(!length(scores <- x$x)) it's weird to have an <- assignment function here, and no scores object was passed from the argument values. Given that you don't have test functions, it's hard to tell if there is any issue over here. Could you explain a little bit?

This is assignment and logical checking in the same command. This works because length works on the lefthand side of the expression i.e scores. I agree it is weird, and have fixed that by separating them.

plot.R:

  • L2. Use NULL instead of {} here.

Done.

  • Given that the code are mostly shared between plot_sample_tic and plot_sample_boxplot, merge these 2 into plotSample(plotType = c("tic", "boxplot")) or something similar, so you can get the benefit to avoid duplicate code and share the documentations.
  • Same above issue for the plot_class_sd and plot_class_boxplot.

Thank you for this excellent suggestion. I have merged plot_samples, plot_lipidclass and plot_molecules and provided plot type as an argument. The refactored code is much cleaner for both me and the user.

  • For the measure argument in all plot.R functions, put all possible values in the function, and use match.arg inside the function. e.g.,
plot_class_sd <- function(data, measure = c("Area", "Area.Normalized", "Height", "Retention.Time")){
        measure <- match.arg(measure)
}

This gives you benefit of non-perfect matching (if user specifify measure="Heig", it still matches and return Height) and if not specified, measure="Area" will be the default value.

I have used match.arg in several places in the package, where it is possible to enumerate exhaustively all possible values for a parameter. However, in this case, it is not possible, because the these measures are based on Skyline export files, and there are a lot of measures that the user can choose to export.

General:

  • When you have default values for arguments, document it in the @param.

Done. Checked all docs and noted default values where provided.

  • check all plot_chain_distribution, plot_molecule_boxplot/cv/sd, plot_sample_boxplot/tic, plot_class_boxplot/sd/enrichment... for grouping possibility...

Done as mentioned above. Only exceptions are plot_chain_distribution and plot_class_enrichment, as they have different arguments (they visualize DE results rather that raw data).

SUGGESTION

  • it's recommended to keep the current normalize_itsd and normalize_pqn as internal functions, and add a major function e.g., normalize to call those internal function based on the newArgumentName. e.g., normalize(newArgumentName = c("itsd", "pqn")), so that you can include those shared duplicate code (e.g., the checking of mcols(assays(data), use.names=TRUE_[measure,"normalized"]...) only into this major function.

In this case, I opted not to merge the functions, because normalize is so ubiquitous (it is even defined in BioGenerics), and having another function by the same name can be confusing. I also checked normalization functions in limma package, and found that method-specific normalizations are exported, so I am guessing it should be OK. Instead, I removed all checking into internal .prenormalize_check to avoid code duplication.

ahmohamed commented 5 years ago

Continue:

man/

  • mva.md example. When I run the example, this line gives me error. I wonder if you have any idea about it? Since the building is correct without error, it's might be just related to my system. So I'm just recording here.
plot_mva(mvaresults, color_by = "group")
Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  :
  X11 font -adobe-helvetica-%s-%s-*-*-%d-*-*-*-*-*-*-*, face 1 at size 11 could not be loaded

This is likely to be an encoding issue related to your system. Please see if suggestions this StackOverflow post resolves the issue. From my side, I have checked the plot_mva, and it doesn't seem that we used any special characters that may have cause this.

  • normalize_itsd.Rd, plot_sample_tic.Rd, plot_class_sd, plot_class_boxplot, plot_molecule_sd, plot_molecule_cv, plot_molecule_boxplot, summarize_transition: Instead of data.frame, the data is actually an SummarizedExperiment returned by read_skyline right? Please check.

You are correct. Thanks for catching this. All fixed now.

  • For the dataset documentation, please use the dataset name instead of NULL. See here for details.

Done.

  • Add @format for the format of the datasets.

A cool side effect of replacing NULL with dataset name is that roxygen inspects the actual dataset and automatically generate a \format{} entry for it :)

  • SUGGESTION: Group together the documentation for your datasets. So that use the same @rdname lipidr-data for each item. Note that by using the same rdname, the @description contents for each dataset will be shown in the same section. So add data name before the contents. E.g.,
#' @docType data
#' @description lipidDefaults: A dataset ...
#' @rdname lipidr-data
#' @format A data frame with .. rows and .. variables... and describe each variable...
#' @examples data(lipidDefaults)
lipidDefaults

#' @description data_normalized: A dataset ...
#' @rdname lipidr-data
#' @format ...
#' @examples data(data_normalized)
data_normalized

To be honest, I have never encountered a package where datasets are grouped in the same Rd, and I feel it is a bit confusing, especially when the datasets are not related (as in here). I am happy to do it if it is really necessary.

vignette

  • Add a Table of contents for a cleaner hierarchy. Refer here for more details.

Added.

  • Suggestion: add more details about author, e.g., affiliation.

Added.

  • Suggestion: Re-organize your vignette and use sub-headers to make a better hierarchy. e.g., Maybe use the section of Visualization to include all plot functions? Or keep the current structure (analysis + visualization) and rename the "Visualization of raw data" into "raw data quality check"?

lipidr is a workflow, so I think that the code in vignette describe the steps of workflow in the same order a user would expect/execute. I have reorganized the headings and subheadings as follows:

  1. Introduction: Overview of the package
  2. Installation: instructions...
  3. Analysis workflow: All steps in the workflow. Provided more subheadings under for more organization. Renamed the "Visualization of raw data" into "Raw Data Quality Check".
    • Add a section for sessionInfo() in the end of the vignette.

Added.

ahmohamed commented 5 years ago

@Liubuntu Dear Qian,

Thank you so much for your time and your comments. I have provided a point-by-point response including all issues that I have addressed. Kindly, let me know if you have other comments.

Currently, there is a build error on tokay2 with no helpful message. It seems to be related to vignette, but I cannot reproduce this error on any of the other build systems / CIs. I can see from bioc-devel that the build system has been acting up lately, so I'm also not sure this error is related to the build system. Any help would be appreciated.

Thanks again. Ahmed.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

e9e4e6e custom appveyor script to test bioc-devel-win

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

ahmohamed commented 5 years ago

@Liubuntu I managed to setup testing for lipidr on Windows and Bioc-devel using Appveyor. The tests seems to pass without errors or warnings here. At this point, I think it is likely that the error we are seeing is related to tokay2 build system. Do you know who is maintaining the build system or might be able to help with the issue?

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

d0c2bc0 remove encoding from DESCRIPTION

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

8bad56c replace deprecated argment in ropls 1e8a518 small typo

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

4a087d2 fix values for ropls new argument

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

ahmohamed commented 5 years ago

@Liubuntu After few trials, I managed to fix the tokay2 error. For some reason, the deprecated argument in ropls seemed to stop building the vignette (speculating).

I also removed the UTF-8 encoding from DESCRIPTION, which may have caused the error in plot_mva on your system. Could you please check if the issue still persists?

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

3476292 revert last commit

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

81cbca3 reinstate the last commit (error not related)

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

c01ceda disable plot_mva_loading (another try)

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

ahmohamed commented 5 years ago

@Liubuntu I give up. The exact same code which passed in https://github.com/Bioconductor/Contributions/issues/1065#issuecomment-485224714 is giving errors in https://github.com/Bioconductor/Contributions/issues/1065#issuecomment-485225596. After lots of investigation using Appveyor, the error seems to originate from unstable behavior of ggrepel only when run in the vignette and only on Windows. I have turned eval=FALSE for this particular code chuck.

I think we are ready now for another review. Hopefully, we can make it before the release. Regards. Ahmed.

Liubuntu commented 5 years ago

Hi @ahmohamed ,

Please see following for the additional comments about the new class SkylineExperiment and some related issues:

@Liubuntu After much thought into the new class issue, I feel I am more inclined to define a new class for the following reasons: Requirement for certain attributes to be defined cannot be enforced without new class definition. An object being SkylineExperiment implies also that lipid annotations are present (with specific columns, namely Class, Molecule).

selectMethod("colData", "SkylineExperiment")
Method Definition:

function (x, ...)
x@colData
<bytecode: 0x564d6d1e8de8>
<environment: namespace:SummarizedExperiment>

Signatures:
        x
target  "SkylineExperiment"
defined "SummarizedExperiment"
Liubuntu commented 5 years ago

Continued:

R/

man/

data/

vignette