(inactive) MetCleaning - Githubissues

jaspershen commented 7 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Repository: https://github.com/jaspershen/MetCleaning

Confirm the following by editing each check box to '[x]'

[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.

I am familiar with the essential aspects of Bioconductor software management, including:

[x] The 'devel' branch for new packages and features.
[x] The stable 'release' branch, made available every six months, for bug fixes.
[x] Bioconductor version control using Subversion (optionally via GitHub).

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 7 years ago

Hi @jaspershen

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: MetCleaning
Type: Package
Title: MetCleaning
Version: 1.1.0
Date: 2016-12-13
Author: Xiaotao Shen, Zhengjiang Zhu
Maintainer: Xiaotao Shen<shenxt@sioc.ac.cn>
Description: Automatic and integrated tool for data cleaning and statistical analysis of large scale MS-based metabolomics data.
LazyData: true
Depends: R (>= 2.10)
Imports:
    ggplot2,
    pcaMethods,
    impute,
    missForest,
    scatterplot3d,
    pheatmap,
    beeswarm,
    ellipse,
    pls,
    plsdepot,
    pROC,
    e1071,
    snow,
    methods,
    pathifier,
    xlsx,
    graphics,
    grDevices,
    utils
License: GPL-2 | file LICENSE
URL: https://github.com/jaspershen/MetCleaning
RoxygenNote: 5.0.1
Suggests: knitr,
    rmarkdown
VignetteBuilder: knitr
Note:

bioc-issue-bot commented 7 years ago

Your package has been approved for building. Your package is now submitted to our queue.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/MetCleaning_buildreport_20161228154513.html

jaspershen commented 7 years ago

Hi LiNk-NY, thank you for reviewing my package. I have revised the most of the errors and warnings in the package. However, in my local computer (win10), I can't run BiocCheck. So I don't know whether the errors and warnings have been removed. Thanks.

lawremi commented 7 years ago

The diagram in your vignette shows data potentially coming from xcms, but through a CSV file. Since xcms is a Bioconductor package, why not accept xcms data structures, and perhaps data structures from other packages, as there is a mature ecosystem?

jaspershen commented 7 years ago

Hi lawremi, metabolomics data analysis can be divided into several steps, first step is data processing (XCMS or mzMind), which transform MS raw data into dataset (matrix) which can be used for sequence statistical analysis. However, before statistical analysis, the data need to accept data pre-processing (or data preparation, data cleaning) to remove the system variations. So the package MetCleaning is used to do the data pre-processing. I don't integrate XCMS into MetCleaning because there are a lot of parameters should be optimized in the XCMS based on experiment. Thanks

lawremi commented 7 years ago

All I am suggesting is that the user can pass the output of XCMS to your functions directly, without going through any external files. I wasn't suggesting that your package wrap XCMS.

jaspershen commented 7 years ago

Hi, lawremi. I am very sorry to wrongly understand your suggestion. Another external file named "sample.information" is used to offer sample information such as injection order, batch and group information, which are necessary for QC based normalization, data integration and statistical analysis. So this file should be offered because those information can't be directly offered by XCMS. Thank you.

lawremi commented 7 years ago

Why can't you use the phenoData component of xcmsSet for that?

jaspershen commented 7 years ago

Hi, lawremi. You know that some users don't use XCMS to do processing of MS raw data, they may use mzMine, MS-DIAL or other commercial software (for example, Agilent's Profinder). In MetCleaning, we only need a metabolomics dataset which contains mz(mass to charge) and rt (retention time) information and we don't care which software they use. I just use the dataset from XCMS as a example, because in my lab we mainly use XCMS. However, MetCleaning can process dataset from any software. This is why I don't want to use so much functions in XCMS. Thanks

lawremi commented 7 years ago

The goal would be to make getting data into your software as easy as possible. You could support xcms directly, and also support other types of input from other software.

jaspershen commented 7 years ago

Hi, lawremi, offer the sample.information.csv file is also easy for most of users.

lawremi commented 7 years ago

Again, I didn't suggest to remove any functionality. Just to support a bridge from xcms, which provides standardized infrastructure for this type of data.

jaspershen commented 7 years ago

Hi, lawremi, I don't understand which type of data should be supported in MetCleaning? mzXML? Now, MetCleaning only need a metabolomics dataset which contains mz and rt information from xcms, mzMine or any other software, which is easy enough for most of the users who want to do data preparation and statistical analysis. You can see the help document in the MetCleaning. Thanks

lawremi commented 7 years ago

I was proposing to support the in-memory xcmsSet data structure as input.

jaspershen commented 7 years ago

Thank you for your suggestion. In the next version, I will consider to support xcmsSet data structure as input. In this version, because the input mode is easy enough for most of users (my teammates and friends have used it for about three months), so I don't want to change its input mode. Do you think is that OK? Thanks

lawremi commented 7 years ago

No, I don't, because then you'll probably never change it, since users will just go along with it, busy with their work and not thinking about how things could be easier. Sorry, that might sound cynical but it's motivated by idealism.

jaspershen commented 7 years ago

OK, I think it will take me a lot of time to learn to how to use xcms.......In fact, I don't think it is easier by using the xcmsSet. Thanks again.

jaspershen commented 7 years ago

Hi, how can I close this issue and submit it again? Thanks

jaspershen commented 7 years ago

Hello, I have change the package. So can you review it again? Thank you very much.

LiNk-NY commented 7 years ago

Hi @jaspershen Can you bump the package version ?

jaspershen commented 7 years ago

Hi, thank your for your reply. And the version of package has been changed to 0.99.1. Thank you very much

jaspershen commented 7 years ago

Hi. Can you review my new version package? I have changed it to version 0.99.1. Thank you very much! Xiaotao

LiNk-NY commented 7 years ago

Hi Xiaotao @jaspershen, I don't see any SPB build history for a recent version of the package. It may be possible that you have to create a webhook again for the build system to pick it up. Regards, Marcel

LiNk-NY commented 7 years ago

Hi Xiaotao @jaspershen, Please re-establish the webhook for your package. See this webhook link for more details.

Regards, Marcel

jaspershen commented 7 years ago

Hi Marcel, @LiNk-NY Thank you very much for your reply and guide. I have re-established the webhook for my package-MetCleaning according to the user guide. Best regards Xiaotao

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

8dd4ec7 change description

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/MetCleaning_buildreport_20170310205814.html

LiNk-NY commented 7 years ago

Hi @jaspershen, There are a few issues with the check on your package:

[ ] Undefined global variables/functions

Undefined global functions or variables:
qc qcorder sampleorder

[ ] The package size should not exceed 15 Mb.
[ ] The package vignette doesn't seem to be building.
[ ] The documentation should have more examples.
[ ] Maintainer must subscribe to the bioc-devel mailing list. Subscribe here

Regards, Marcel

jaspershen commented 7 years ago

Hi @LiNk-NY Thank you for your reversion. I am fixing those bugs now. It may take me several days, because I don't understand why some issues appear. Fot the errores which apper, I have some questions (1) The vignette is built using devtools package, so I don't know why this is an error? (2) The package is more than 15M because the demo data is very large, because this package is used to process large-scale metabolomics data. So I don't know shouled I remove this demo data? (3) I am writing examples for documentation to make sure that more than 80% functions have examples. (4) I have registered in the bioc-devel and added my e-mail address in the mailing list. I will fix all those error this week and re-submit it again. Thanks again for you reversion and guide. Best regards, XIaotao

LiNk-NY commented 7 years ago

Hi Xiaotao @jaspershen,

I hope to see the changes soon. To answer your questions:

I checked your package locally and I didn't get that error. Perhaps it's a build system hiccup. We will see once you make changes to your package and bump the version. Ignore that error for now.
You can run your examples on a smaller dataset. Your data does not have to be so big for the examples.
Good.
Good.

Please keep in mind that you should have runnable code in your vignette. Please don't create a vignette with all code chunks set to eval = FALSE. Set them to eval = TRUE for most chunks unless you have a good reason for why they should not be evaluated. Thanks.

Regards, Marcel

jaspershen commented 7 years ago

HI @LiNk-NY Thank you very much. All of the error will be fixed and I will be resubmit it today. Thanks again. Regards Xiaotao

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

03a8a5b change for 0.99.3

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/MetCleaning_buildreport_20170320230831.html

jaspershen commented 7 years ago

Hi @LiNk-NY I have fixed all the errors and warnings which I got from BiocCheck in local. (1) More than 80% of exported functions have runable examples. (2) The demo data is less than 1 M now. (3) All code chunks have been set to eval = TRUE. (4) My email address-shenxt@sioc.ac.cn has been added in the bioc-devel mailing list. I have changed the version to 0.99.3 and pushed it to github. Thank you very much. Best regards Xiaotao

LiNk-NY commented 7 years ago

Hi Jasper @jaspershen, Thank you for submitting MetCleaning to Biconductor. Please see the review below.

MetCleaning #227

MetCleaning provides several tools for working with metabolomics data including pre-processing steps and analysis.

Major

Functionality & Integration

Your package doesn't seem to work well with Bioconductor infrastructure already in place. There are at least a couple of packages that work with the same data type as mentioned by @lawremi. Please strongly consider making it easier for your package to work well with the existing Bioconductor software.
It seems like you don't use any of the S4 classes and methods see here() for your package. You could make use of this system in order to create a more robust class with a formal definition and validity checks.
Without a formal class definition, it is harder to extend your class.

Grammar

Avoid 1:n style iterations more info()
Avoid the creation of files without the user's explicit consent
Related to above: Add argument for the file location of the created files
Argument names should be short and to the point ordered.qc instead of qc.has.order.
Please add a default path argument (i.e., ".") rather than NULL to avoid doing path <- getwd()
Please be descriptive and informative when documenting your functions
Avoid using donttest flags in your examples. Your examples should be runnable

Structure

MetFlowData should be a formal S4 class with a constructor function and accessors - your operations rely to much on data at a specific index rather than an imposed structure that the S4 system provides
Avoid the repetition in code (e.g., in ExportData function: hasQC should just save qc.info)
Validity checks for the MetFlowData class not available

Size

I wasn't able to go through all of the R scripts that you submitted. Please make sure that your are following the package guidelines (see link)

Vignettes

Please add more runnable examples in the vignette
Avoid using static images in the vignette
With the amount of functionality you provide, you should have several vignettes showcasing what your functions do to pre-process and analyze data
Pick one output type in the vignette, preferably html_document

Unit Tests

Please add unit tests in order to ensure robust code

Minor

Grammar

Remove excitement in error messages (!!!)

Formatting

Your code should not go over 80 columns, it should be easy to ready and come back to when bugs come up
Function lengths should be minimal
Functions should only acheive one objective and not try to do too much

Regards, Marcel

Update: I will figure out why your package was tagged as ABNORMAL.

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

b1f5109 default path set as "." 5431b17 Avoid 1:n style iterations. eb565bf qc.has.order. is changed to ordered.qc. 7528d78 Remove static images in the vignette. d3111d4 Remove excitement in error messages (!!!) 7ddf0bb Change code which go over 80 columns. 54a29c0 Change MetFlowData as S4 class. 05ecc7c re document b12da05 change met.data b8c7f2a fix bugs cbe50ed fix bugs db507b0 fix bugs 3b0f6ce fix bugs

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/MetCleaning_buildreport_20170322060601.html

jaspershen commented 7 years ago

Dear Marcel, Thank your very much for your review of my package. According to your advices, I have re-write the package today. The detailed information is listed below: Major Functionality & Integration (1) My package is used to do pre-processing of larege-scale metabolomics data. (2) I have changed MetFlowData to S4 class. (3) The MetFlowData has been changed to S4 class. Grammar (1) The 1:n style iteration have been rmoved. (2,3) MetCleaning do several processing and output results for each process, so the folders shouled be created to save results. It make the results more clear. (4) According to your suggestion, I have changed the argument names which are not clear. (5) All the path <- NULL have been changed according to your suggestion. Thanks. (6) I have re-written some uncaler documents of functions. (7) When running MetCleaning and MetStat, some processings should be type the selection, for example, when doing PLS-DA, the best component number should be selected by users. So, when I remove the "donttest" from the examples of MetCleaning and MetStat, there is a error when using Build Check. I find a lot of solutions for it in goodle, but I didn't find the solution methods. I am sorry about it. But the examples of MetCleaning and MetStat can be runable when I paste them manually. Structure (1) The MetFlowData has been changed to S4 class. (2) This has been changed. (3) MetFlowData has been changed to S4 calss, so I think this may be solved. Size (1) All the R scripts are in the "R" folder. and I am sure the packge was built following the gudelines. Vignettes (1) The most important functions in MetCleaning are MetCleaning and MetStat, the runable examples of them have been provided in the vignette. Other exported functions such as SXTpate are not useful for users, so their examples are not provided. (2) The workflow and data organization are significant for user to understant this package and usage. So they are remained, other 6 figures are removed from the vignette. (3) Only MetCleaning and MetStat are useful for users who want to do data processing and statistical analysis. So one vignette is enough to provide the usage of them. (4) I am sorry about it , but I don't understand how to pick htm_document of vignette? I will be grateful if I can get help. Thanks. Unit Tests (1) The unit tests has been added into the package.

Minor Grammar (1) The excitement in error messages (!!!) have been removed. Formatting (1) Most of code which go over 80 column have been changed. (2) I checked the longest functions and removed the space line from it. And all the functions only do one project.

The version has been changed to 0.99.4 and the package has been pushed to github. Thanks again for you review and advicements！ Regards Xiaotao

LiNk-NY commented 7 years ago

Hi Xiaotao, @jaspershen

A few of the points I mentioned in the review were not addressed.

The major point about integration with Bioconductor is still lacking and there seems to be no effort on your part to bridge your software with Bioconductor. You may wish to not make any changes in that front but keep in mind you are submitting your software to Bioconductor so it is expected in every submission to inter-operate with, enhance, or use existing Bioconductor software. This is a major factor in the review process.

It's good that you're using S4 but you need to make sure that your MetFlowData class has a constructor function that does some checks before creating the class. You may also want to include some validity checks to make sure that data inside the class are in order needed. You would have to create accessor method functions so to avoid the use of @ throughout your functions. Usually these functions have the same name as the slot in the object.

Reminder: Avoid the use of . in function names.

Refer to: http://bioconductor.org/developers/package-guidelines/#classes

Some points were not entirely addressed:

1:n style iterations
path = NULL arguments
Documentation for some arguments is not descriptive (e.g., @param MetFlowData)
write.csv is used extensively, please don't create files unless the user asks to create files. Create some argument for ensuring this.
All your exported functions should be testable. You can select defaults for the examples and still run them.
Don't export functions that users will not be using. Keep those un-exported and available only to you as the developer.

For the vignette: Select html_document as the only value for output:. Please make the vignette: value as

  %\VignetteIndexEntry{Vignette Title}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}

If find it difficult to see how a user would want to use a function with more than 20+ arguments as in MetFlowData.

Regards, Marcel

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

1746cf0 fix bug in DataIntegration

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/MetCleaning_buildreport_20170503014509.html

LiNk-NY commented 7 years ago

Hi Jasper @jaspershen,

Do you have any updates on making your package compatible with the Bioconductor ecosystem? Otherwise, I'd be forced to close the issue.

Best regards, Marcel

LiNk-NY commented 7 years ago

Your issue has been closed due to inactivity. To submit your revised package to Bioconductor, please create another issue. Thank you. -Marcel

Bioconductor / Contributions

(inactive) MetCleaning #227

MetCleaning #227

Major

Functionality & Integration

Grammar

Structure

Size

Vignettes

Unit Tests

Minor

Grammar

Formatting