Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

SGCP #2840

Closed na396 closed 1 year ago

na396 commented 1 year ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 1 year ago

Hi @na396

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: SGCP
Type: Package
Title: SGCP: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks
Version: 0.99.0
Authors@R: c(person("Niloofar", "AghaieAbiane", email = "niloofar.abiane@gmail.com" ,role = c("aut", "cre")),
             person("Ioannis", "Koutis", email = " ikoutis@njit.edu",role = c("aut")))
Description: SGC is a semi-supervised pipeline for gene clustering in gene co-expression networks.
   SGC consists of multiple novel steps that enable the computation of highly enriched modules 
   in an unsupervised manner. But unlike all existing frameworks, it further incorporates a 
   novel step that leverages Gene Ontology information in a semi-supervised clustering method 
   that further improves the quality of the computed modules.
License: GPL-3
Encoding: UTF-8
LazyData: true
Imports: ggplot2, expm, caret, plyr, dplyr, GO.db, annotate, SummarizedExperiment, 
        genefilter, GOstats, RColorBrewer, xtable, Rgraphviz, reshape2, openxlsx,
        ggridges, DescTools, org.Hs.eg.db, methods, grDevices, stats
Suggests: knitr
Depends: R (>= 4.2.0)
biocViews: GeneExpression, GeneSetEnrichment, NetworkEnrichment, SystemsBiology,
   Classification, Clustering, DimensionReduction, GraphAndNetwork,
   NeuralNetwork, Network, mRNAMicroarray, RNASeq, Visualization
VignetteBuilder: knitr
NeedsCompilation: no
URL: https://github.com/na396/SGC
Date/Publication: 2022-10-06
RoxygenNote: 7.2.1
bioc-issue-bot commented 1 year ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

na396 commented 1 year ago

Greetings @jianhong @lshep Thank you for the comment. The timeout problem happens in " creating vignettes", because my package in general takes hours or even days to be completed. This is the nature of my package. The example I provided in the "vignettes" is the smallest data I could show as an example for my package.

Here is the way I wrote the vignettes. I provided a small dataset in the vignettes and then I tried to explain how to use the functions in my package using that dataset. So during this process, in section "creating vignettes", it may take up to 3 hours to be completed. Is there any solution for this scenario? Thank you so much

lshep commented 1 year ago

Tagging: @vjcitn / @hpages for additional thoughts and comments. In generally packages cannot take that long to build on our builders. Packages need to be able to be built daily by our daily builder with a smaller example dataset. Perhaps storing intermittent data objects to load in various steps while make more in depth long tests might be an option. The other option would be to convert it into a workflow package but the timeout limit for a workflow package I believe is 2 hours. @hpages would appreciate input as well.

na396 commented 1 year ago

@lshep I check my code one more time, it takes about 1:00 hour to run. Can you tell me what your recommendation is? Thank you so much, and I apricate your help in advance.

vjcitn commented 1 year ago

You should have code and "pre-cooked" data that allow the package to build and check in under (20?) minutes. That's good for you and for us -- you can get a meaningful result in 20 minutes -- you will know if something has gone wrong with your use of the ecosystem almost interactively. Then accompany this with a workflow package that can consume an hour of build time but is run infrequently. It would have more realistic computations.

na396 commented 1 year ago

@vjcitn Thank you so much for your comment. I appreciate a lot. This time excess is due to the nature of the algorithm inside package, not the data. Please see this https://arxiv.org/abs/2209.10545. In this package I need to call another library for 11 times in my algorithm, and each time call takes up to 7-8 minutes regardless of the input size, . So from my side, there is no way I could change the algorithm. Is there any solution you recommend?

vjcitn commented 1 year ago

I can't provide detailed information at this time. Perhaps this will have to wait for inclusion in a future release of Bioconductor. Do the best you can.

na396 commented 1 year ago

@vjcitn Thank you so much. I do appreciate your help. I was wondering if you know the estimated time for Bioconductor release? Or Can I change the package into workflow?

na396 commented 1 year ago

Greeting @vjcitn @lshep I have changed the package, and now it takes roughly 13 minutes to be run. However, I have taken more space, in total less than 5 MB as I need to store some results. All rda files are compressed, and on my local computer I did not have any error and warnings. I pushed it to "git@git.bioconductor.org/SGCP.git". Please let me know if it's fine or I need to do anything. Many thanks for your consideration in advance

na396 commented 1 year ago

Hi @lshep I was wondering if you have seen my previous message?

lshep commented 1 year ago

You would probably want to store the results on the experiment hub to get the package down to a reasonable size. Also then users would only need to store/download the data when they were interested in running your examples rather than all the time.

na396 commented 1 year ago

@lshep Thank you for the message. I have a quick question,. When I was looking at the Bioconductor guidance, I noticed that my package size, which is 3.12 MB, is in acceptable for a Bioconductor. So my question is do I still need to use experiment hub. I also have one more question, is there anything I need to do for further steps? Will my package evaluate for the Bioconductor open source? Thank you so much for your time and consideration

lshep commented 1 year ago

You need to get the package to not TIMEOUT. Please push any changes to see how the package runs on the system. I suggested ExperimentHub; looking back I misread your comment I thought you said in order to get the package to run that you were over the 5 MB limit so no ExperimentHub is not necessary.

na396 commented 1 year ago

@lshep The timeout problem is resolved, and I have pushed the changed. And I this everything is ready.

lshep commented 1 year ago

Please push changes to git.bioconductor.org with a version bump. You need to trigger a new build. See https://github.com/Bioconductor/Contributions/issues/2840#issuecomment-1280774435

na396 commented 1 year ago

Ok, will do soon, thanks

na396 commented 1 year ago

@lshep Sorry for keep asking question. I just checked my package, and noticed that the package directory size is 3.2 MB, while its installed size is 7.1 MB. Do I need to use the ExperimentHub? Thank you in advance

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 147671fb991e5446858eb113742a0ea1cd693dc5

na396 commented 1 year ago

@lshep Many many thanks, space, and time are resolved. I have bumped the version and pushed the changes. Everything is ready now, please let me know if I need to do any step. Thank you so much

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: fb737d0753cd7414625e488eda30d7a5e03e07b7

na396 commented 1 year ago

@lshep Pushed another. Thanks

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: e0f0bd7edeb102c860d3485843c48945817df63d

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

na396 commented 1 year ago

@lshep Hi, Do I need to do anything at this stage?

lshep commented 1 year ago

Please wait for the reviewer to do an indepth review of the package. This normally occurs with 2-3 weeks of a clean build report.

jianhong commented 1 year ago

Package 'SGCP' Review

Thank you for submition your package to Bioconductor. The package passed check and build. It is in pretty good shape. However there are several things need to be fixed. Please try to answer the comments line by line when you are ready for a second review.

Code: Note: please condsider; Important: must be addressed.

The NAMESPACE file

General package development

R code

Documentation

na396 commented 1 year ago

@jianhong Thank you so much for the comments.

in line 18 import("org.Hs.eg.db") => I need to pass this object to GOstat function in line 19 import("ggplot2") => I have used many functions of ggplot for the aim of visualization. in line 20 import("expm") => I need to import the operation ^ for matrix powering. in line 21 import("dplyr") => I have used plenty functions of dplyr library fir dataframe related tasks. in line 22 import("GO.db") in line 23 import(annotate, except=c(toFile)) in line 24 import("genefilter") in line 25 import("GOstats") in line 26 import("RColorBrewer") in line 27 import("xtable") in line 28 import("Rgraphviz") in line 29 import("reshape2") => fixed in line 30 import("openxlsx") => fixed in line 32 import("caret") => fixed

In general SGCP highly depends on ggplot, dplyr, caret, and GOstats packages "GO.db", "annotate", "RColorBrewer", "genefilter" are the dependencies of GOstats. When I installed the GOstats for myself, the dependencies were not installed. After multiple attempts, I installed the dependencies manually and then GOstats package. And this is the reason I imported these libraries. The remaining are fixed.

NOTE: Consider adding the maintainer's ORCID iD in 'Authors@R' with 'comment=c(ORCID="...")' => Fixed

NOTE: Consider adding unit tests. We strongly encourage them. See https://contributions.bioconductor.org/tests.html => this package works with big data, and its a pipeline for series of step on large dataset. Each step by itself has many parameter that may result in different solutions. Additionally, each step may take up to hours to run that violates the time limit requirement for the Bioconductor. Moreover, each step does not have a deterministic solution. This pipeline has randomness in each step.

NOTE: no direct slot access with @ or slot() - accessors implemented and used. Please ask help form HyperGResult-accessors => I'm not sure if understand it correctly, But, in "GO_Genes <- hg@goDag@nodeData@data'", hg is an object returned by hyperGTest function in GOstats package, and at this stage, SGCP try to retrieve some information from this object. Please guide me if I need to change it.

important: No paste in message(), message, stop => the first two are fixed. For caption_sym <- paste0(" output of ", stp, " , is not symmetric")' I use it in the next syntax which is stop(caption_sym). I used the paste command, because this function is for error detection and is used in multiple stage, with paste function I can make dynamic that the stop syntax tells me where the error has happened.

NOTE: :: is not suggested in source code unless you can make sure all the packages are imported. => Fixed

NOTE: Vectorize: for loops present, try to replace them by *apply functions. => for loops does not have a regular pattern or structure, depends on the cluster size and shape, it may be different . In side each iteration, many steps are taken and none of these has a regular structure. Throughout this package, everything is implemented vectorized except these three loops that I was not able to come up with vectorized implementation.

Important: Remove unused code. => Fixed

NOTE: Avoid using '=' for assignment and use '<-' instead => Fixed.

Important: Please consider to add drop=FALSE to avoid the reduction of dimension for matrices and arrays. => The pipeline at these stages, actually, needs to reduce the dimension. This is the target of these steps.

NOTE: Functional programming: code repetition. => Although it seems that these syntax are repetition, they are not the same. Each are performed for different purpose and need to be performed. Some of them also let me track down the code easier if bugs report in future. Some of them also are repeated in different functions. Because those functions can be used dependently or independently. Therefore, some statements are needed to be checked in both for case the functions are used independently. For instance, in the begining of two functions ezSGCP and geneOntology it is checked that the dir is in c("under", "over"). Because ezSGCP is a wrapper of multiple functions including geneOntology and geneOntology function also can be applied independently. Therefore, in the beginning of each function I have checked if the this statement is valid. This acutally helps me to better maintain the package.

Important: Please include Bioconductor installation instructions using BiocManager. => Fixed

Note: Vignette includes motivation for submitting to Bioconductor as part of the abstract/intro of the main vignette. => I'm not sure if I understant correctly, I have added the information of the package installation through the BiocManager

Important: Please include Bioconductor installation instructions using BiocManager. => fixed

na396 commented 1 year ago

I'm pushing the modification into the repository.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 4f49cf88e9e4165c6d5fdbe19b2f11ef4b7d9dc4

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 4b56cad8f0dd8556be1b6f30844f6c6b76969c60

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

jianhong commented 1 year ago
bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 3a43d22b29a0c3f21b3e179f7913aacbee8b7af6

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

na396 commented 1 year ago

@jianhong Is it possible to rewrite GO_Genes <- hg@goDag@nodeData@data by GO_genes <- graph::nodeData(GOstats::goDag(hg)) =< Done

Please move back the BiocManager::install section into your vignettes. => Done

jianhong commented 1 year ago

I think there is mis-communication about the BiocManager::install section. I mean please show the code

BiocManager::install('SGCP')

in your vignettes.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: c84b43e7b909dfdc811fd14f539268a7eb88252a

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

na396 commented 1 year ago

@jianhong I added the installation to the vignettes, but this cause the following error.

jianhong commented 1 year ago

OK, try

```{r, eval=FALSE}
library(BiocManager)
BiocManager::install(c('SGCP', 'SummarizedExperiment', 'org.Hs.eg.db'))
```
bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 365fce8e1f9dfae7b7d8365199553d538df4c61b

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 73b69c43e4440bfa5055b2c6c363077970cfdeac

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

na396 commented 1 year ago

adding the installation causes the following warnings on the macOS WARNING: R CMD check exceeded 10 min requirement