na396 commented 2 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Repository: https://github.com/na396/SGCP

Confirm the following by editing each check box to '[x]'

[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.

I am familiar with the essential aspects of Bioconductor software management, including:

[x] The 'devel' branch for new packages and features.
[x] The stable 'release' branch, made available every six months, for bug fixes.
[x] Bioconductor version control using Git (optionally via GitHub).

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 years ago

Hi @na396

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: SGCP
Type: Package
Title: SGCP: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks
Version: 0.99.0
Authors@R: c(person("Niloofar", "AghaieAbiane", email = "niloofar.abiane@gmail.com" ,role = c("aut", "cre")),
             person("Ioannis", "Koutis", email = " ikoutis@njit.edu",role = c("aut")))
Description: SGC is a semi-supervised pipeline for gene clustering in gene co-expression networks.
   SGC consists of multiple novel steps that enable the computation of highly enriched modules 
   in an unsupervised manner. But unlike all existing frameworks, it further incorporates a 
   novel step that leverages Gene Ontology information in a semi-supervised clustering method 
   that further improves the quality of the computed modules.
License: GPL-3
Encoding: UTF-8
LazyData: true
Imports: ggplot2, expm, caret, plyr, dplyr, GO.db, annotate, SummarizedExperiment, 
        genefilter, GOstats, RColorBrewer, xtable, Rgraphviz, reshape2, openxlsx,
        ggridges, DescTools, org.Hs.eg.db, methods, grDevices, stats
Suggests: knitr
Depends: R (>= 4.2.0)
biocViews: GeneExpression, GeneSetEnrichment, NetworkEnrichment, SystemsBiology,
   Classification, Clustering, DimensionReduction, GraphAndNetwork,
   NeuralNetwork, Network, mRNAMicroarray, RNASeq, Visualization
VignetteBuilder: knitr
NeedsCompilation: no
URL: https://github.com/na396/SGC
Date/Publication: 2022-10-06
RoxygenNote: 7.2.1

bioc-issue-bot commented 2 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

na396 commented 2 years ago

Greetings @jianhong @lshep Thank you for the comment. The timeout problem happens in " creating vignettes", because my package in general takes hours or even days to be completed. This is the nature of my package. The example I provided in the "vignettes" is the smallest data I could show as an example for my package.

Here is the way I wrote the vignettes. I provided a small dataset in the vignettes and then I tried to explain how to use the functions in my package using that dataset. So during this process, in section "creating vignettes", it may take up to 3 hours to be completed. Is there any solution for this scenario? Thank you so much

lshep commented 2 years ago

Tagging: @vjcitn / @hpages for additional thoughts and comments. In generally packages cannot take that long to build on our builders. Packages need to be able to be built daily by our daily builder with a smaller example dataset. Perhaps storing intermittent data objects to load in various steps while make more in depth long tests might be an option. The other option would be to convert it into a workflow package but the timeout limit for a workflow package I believe is 2 hours. @hpages would appreciate input as well.

na396 commented 2 years ago

@lshep I check my code one more time, it takes about 1:00 hour to run. Can you tell me what your recommendation is? Thank you so much, and I apricate your help in advance.

vjcitn commented 2 years ago

You should have code and "pre-cooked" data that allow the package to build and check in under (20?) minutes. That's good for you and for us -- you can get a meaningful result in 20 minutes -- you will know if something has gone wrong with your use of the ecosystem almost interactively. Then accompany this with a workflow package that can consume an hour of build time but is run infrequently. It would have more realistic computations.

na396 commented 2 years ago

@vjcitn Thank you so much for your comment. I appreciate a lot. This time excess is due to the nature of the algorithm inside package, not the data. Please see this https://arxiv.org/abs/2209.10545. In this package I need to call another library for 11 times in my algorithm, and each time call takes up to 7-8 minutes regardless of the input size, . So from my side, there is no way I could change the algorithm. Is there any solution you recommend?

vjcitn commented 2 years ago

I can't provide detailed information at this time. Perhaps this will have to wait for inclusion in a future release of Bioconductor. Do the best you can.

na396 commented 2 years ago

@vjcitn Thank you so much. I do appreciate your help. I was wondering if you know the estimated time for Bioconductor release? Or Can I change the package into workflow?

na396 commented 2 years ago

Greeting @vjcitn @lshep I have changed the package, and now it takes roughly 13 minutes to be run. However, I have taken more space, in total less than 5 MB as I need to store some results. All rda files are compressed, and on my local computer I did not have any error and warnings. I pushed it to "git@git.bioconductor.org/SGCP.git". Please let me know if it's fine or I need to do anything. Many thanks for your consideration in advance

na396 commented 2 years ago

Hi @lshep I was wondering if you have seen my previous message?

lshep commented 2 years ago

You would probably want to store the results on the experiment hub to get the package down to a reasonable size. Also then users would only need to store/download the data when they were interested in running your examples rather than all the time.

na396 commented 2 years ago

@lshep Thank you for the message. I have a quick question,. When I was looking at the Bioconductor guidance, I noticed that my package size, which is 3.12 MB, is in acceptable for a Bioconductor. So my question is do I still need to use experiment hub. I also have one more question, is there anything I need to do for further steps? Will my package evaluate for the Bioconductor open source? Thank you so much for your time and consideration

lshep commented 2 years ago

You need to get the package to not TIMEOUT. Please push any changes to see how the package runs on the system. I suggested ExperimentHub; looking back I misread your comment I thought you said in order to get the package to run that you were over the 5 MB limit so no ExperimentHub is not necessary.

na396 commented 2 years ago

@lshep The timeout problem is resolved, and I have pushed the changed. And I this everything is ready.

lshep commented 2 years ago

Please push changes to git.bioconductor.org with a version bump. You need to trigger a new build. See https://github.com/Bioconductor/Contributions/issues/2840#issuecomment-1280774435

na396 commented 2 years ago

Ok, will do soon, thanks

na396 commented 2 years ago

@lshep Sorry for keep asking question. I just checked my package, and noticed that the package directory size is 3.2 MB, while its installed size is 7.1 MB. Do I need to use the ExperimentHub? Thank you in advance

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 147671fb991e5446858eb113742a0ea1cd693dc5

na396 commented 2 years ago

@lshep Many many thanks, space, and time are resolved. I have bumped the version and pushed the changes. Everything is ready now, please let me know if I need to do any step. Thank you so much

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: fb737d0753cd7414625e488eda30d7a5e03e07b7

na396 commented 2 years ago

@lshep Pushed another. Thanks

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: e0f0bd7edeb102c860d3485843c48945817df63d

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

na396 commented 1 year ago

@lshep Hi, Do I need to do anything at this stage?

lshep commented 1 year ago

Please wait for the reviewer to do an indepth review of the package. This normally occurs with 2-3 weeks of a clean build report.

jianhong commented 1 year ago

Package 'SGCP' Review

Thank you for submition your package to Bioconductor. The package passed check and build. It is in pretty good shape. However there are several things need to be fixed. Please try to answer the comments line by line when you are ready for a second review.

Code: Note: please condsider; Important: must be addressed.

The NAMESPACE file

[ ] Note: Selective imports using importFrom instead of import all with import.
- in line 18 import("org.Hs.eg.db")
- in line 19 import("ggplot2")
- in line 20 import("expm")
- in line 21 import("dplyr")
- in line 22 import("GO.db")
- in line 23 import(annotate, except=c(toFile))
- in line 24 import("genefilter")
- in line 25 import("GOstats")
- in line 26 import("RColorBrewer")
- in line 27 import("xtable")
- in line 28 import("Rgraphviz")
- in line 29 import("reshape2")
- in line 30 import("openxlsx")
- in line 31 import("ggridges")
- in line 32 import("caret")
- in line 33 import("magick")

General package development

[ ] NOTE: Consider adding the maintainer's ORCID iD in 'Authors@R' with 'comment=c(ORCID="...")'
[ ] NOTE: Consider adding unit tests. We strongly encourage them. See https://contributions.bioconductor.org/tests.html

R code

[ ] NOTE: no direct slot access with @ or slot() - accessors implemented and used. Please ask help form HyperGResult-accessors
- In file R/SGCP_go.R:
  - at line 20 found ' GO_Genes <- hg@goDag@nodeData@data'
[ ] Important: No paste in message(), message, stop
- In file R/SGCP_ezSGC.R:
  - at line 191 found ' message(paste0("cluster ", remain, " is wiped out"))'
  - at line 193 found ' message(paste0("clusters", remain, " are wiped out"))}'
- In file R/SGCP_adjacencyMatrix.R:
  - at line 25 found ' caption_sym <- paste0(" output of ", stp, " , is not symmetric")'
  - at line 29 found ' caption_01 <- paste0(" output of ", stp, "are not in (0,1)")'
[ ] NOTE: :: is not suggested in source code unless you can make sure all the packages are imported. Some people think it is better to keep ::. However please note that you need to manully double check the import items when you make any change in the DESCRIPTION file during development. My recommendation is to remove one or two repeats to force the dependency check.
- In file R/globals.R:
  - at line 1 found 'utils::globalVariables(c("GOtype", "Method", "Pvalue", "Var1", "Var2",'
- In file R/SGCP_plot.R:
  - at line 355 found ' dplyr::group_by(clusterNum, GOtype) %>%'
  - at line 356 found ' dplyr::summarise(max = max(logPvalue), count = n())'
- In file R/SGCP_semiSupervised.R:
  - at line 56 found ' caret::train(label~., method = "knn", tuneGrid = expand.grid(k = kn),'
  - at line 72 found ' semiSuper <- caret::train(label ~., method = "multinom", data = train)'
[ ] NOTE: Vectorize: for loops present, try to replace them by *apply funcitons.
- In file R/SGCP_clustering.R:
  - at line 24 found ' for (inclus in clusters) {'
  - at line 36 found ' for(outclus in clusters){'
  - at line 281 found ' for(clus in unique(clusLab)){'
- In file R/SGCP_ezPlot.R:
  - at line 531 found ' for(plt in pdf.out){'
- In file R/SGCP_go.R:
  - at line 24 found ' for(ind in seq_len(nrow(hg_summary))){'
  - at line 101 found ' for(direct in direction){'
  - at line 103 found ' for(onto in ontology){'
  - at line 226 found ' for(lab in unique(geneClus$clusterLabel)){'
- In file R/SGCP_plot.R:
  - at line 307 found ' for(clus in levels(df$clusterLabel)){'
  - at line 361 found ' for(c in cluslabs){'
- In file R/SGCP_semiLabeling.R:
  - at line 84 found ' for(lab in clusterNums){'
  - at line 92 found ' for(go in GOIDs){'
[ ] Important: Remove unused code.
- In file R/SGCP_adjacencyMatrix.R:
  - at line 42 found ' #res <- which(vapply(x, class, numeric(1)) != "numeric")'
- In file R/SGCP_clustering.R:
  - at line 7 found ' # as.dist(y)'
  - at line 11 found ' #checkSym(dis.y, stp = "silhouette index")'
  - at line 88 found ' #M <- t(M)'
  - at line 91 found ' #M <- t(M)'
  - at line 183 found ' #k <- seq(2, maxNum)'
  - at line 189 found ' #dfgap <- dfgap[-1, ]'
  - at line 212 found ' #df$indices <- seq(1:nrow(df))'
  - at line 213 found ' #print(head(df))'
  - at line 233 found ' #if(plt){'
  - at line 448 found ' #df = summary(nl.t)'
- In file R/SGCP_go.R:
  - at line 121 found ' #pvalueCutoff = hgCutoff,'
  - at line 202 found ' # if(!is.character(annotation_db)){'
  - at line 203 found ' # stop("type of annotation_db must be character") }'
- In file R/SGCP_plot.R:
  - at line 123 found ' #geo_heatmap <- heatmap(as.matrix(melted_m), Rowv = NULL, Colv = NULL)'
  - at line 132 found ' # theme(axis.text.y = element_text(size = 10, face = 'bold','
  - at line 133 found ' # lineheight = 0.9)) +'
  - at line 138 found ' #theme(legend.position = c(.9, .9) ) +'
  - at line 423 found ' #wb = createWorkbook()'
[ ] NOTE: Avoid using '=' for assignment and use '<-' instead
- In file R/SGCP_ezSGC.R:
  - at line 27 found ' semilabel = FALSE}'
    - at line 30 found ' geneID = paste0(rep("gene", nrow(expData)), seq(1,nrow(expData)))}'
  - at line 46 found ' kopt = NULL}'
  - at line 52 found ' method_k = NULL }'
  - at line 57 found ' maxIteration = 1e+8'
  - at line 58 found ' numberStart = 1000}'
  - at line 83 found ' condTest = TRUE}'
  - at line 88 found ' percent = 0.10 }'
  - at line 93 found ' stp = 0.01 }'
  - at line 97 found ' model = "knn" }'
  - at line 103 found ' semilabel = FALSE}'
  - at line 108 found ' semilabel = FALSE }'
  - at line 128 found ' geneID = resClus$geneID'
  - at line 170 found ' geneLabel = resSup$FinalLabeling'
[ ] Important: Please consider to add drop=FALSE to avoid the reduction of dimension for matrices and arrays.
- In file R/SGCP_clustering.R:
  - at line 286 found ' M <- adja[indin, indout]'
  - at line 334 found ' Yf <- Y[, seq_len(min(2*krelativeGap, ncol(Y)))]'
  - at line 335 found ' Xf <- X[seq_len(min(2*krelativeGap, length(X)))]'
  - at line 342 found ' Ys <- Y[, seq_len(min(2*ksecondOrderGap, ncol(Y)))]'
  - at line 343 found ' Xs <- X[seq_len(min(2*ksecondOrderGap, length(X)))]'
  - at line 350 found ' Yg <- Y[, seq_len(min(2*kadditiveGap, ncol(Y)))]'
  - at line 351 found ' Xg <- X[seq_len(min(2*kadditiveGap, length(X)))]'
  - at line 430 found ' Yt <- Y[, seq_len(min(2*k, ncol(Y)))]'
  - at line 431 found ' Xt <- X[seq_len(min(2*k, length(X)))]'
  - at line 631 found ' adjaMat <- adjaMat[-ind, ]'
  - at line 632 found ' adjaMat <- adjaMat[, -ind]'
  - at line 662 found ' eg <- eg[order(eg$eigenvalues, decreasing = TRUE), ]'
  - at line 673 found ' adjaMat <- adjaMat[-nois_ind, ]'
  - at line 674 found ' adjaMat <- adjaMat[, -nois_ind]'
  - at line 675 found ' D <- D[-nois_ind, ]'
  - at line 676 found ' D <- D[, -nois_ind]'
  - at line 801 found ' Y <- Y[, -1]'
  - at line 808 found ' Yorig <- Y[, seq_len(min(n_egvec, ncol(Y)))]'
  - at line 809 found ' Xorig <- X[seq_len(min(n_egvec, length(X)))]'
  - at line 864 found ' Yf <- Y[, seq_len(min(2*krelativeGap, ncol(Y)))]'
  - at line 865 found ' Xf <- X[seq_len(min(2*krelativeGap, length(X)))]'
  - at line 884 found ' Ys <- Y[, seq_len(min(2*ksecondOrderGap, ncol(Y)))]'
  - at line 885 found ' Xs <- X[seq_len(min(2*ksecondOrderGap, length(X)))]'
  - at line 903 found ' Yg <- Y[, seq_len(min(2*kadditiveGap, ncol(Y)))]'
  - at line 904 found ' Xg <- X[seq_len(min(2*kadditiveGap, length(X)))]'
  - at line 925 found ' Yopt <- Y[, seq_len(min(2*kopt, ncol(Y)))]'
  - at line 926 found ' Xopt <- X[seq_len(min(2*kopt, length(X)))]'
  - at line 953 found ' sil <- sil[ , !(names(sil) %in% "geneIndices")]'
- In file R/SGCP_ezSGC.R:
  - at line 131 found ' expData = expData[-resClus$dropped.indices, ] }'
- In file R/SGCP_ezSGCP.R:
  - at line 125 found ' expData <- expData[-resClus$dropped.indices, ] }'
- In file R/SGCP_go.R:
  - at line 139 found ' df_hg <- df_hg[,c(8,9,1,2,3,4,5,6,7)]'
- In file R/SGCP_semiSupervised.R:
  - at line 23 found ' train <- specExp[rownames(specExp) %in% geneLab$geneID & !is.na(geneLab$label), ]'
  - at line 24 found ' test <- specExp[rownames(specExp) %in% geneLab$geneID & is.na(geneLab$label), ]'
  - at line 37 found ' train <- train[, which(names(train) %!in% "geneID" )]'
  - at line 49 found ' gg <- geneLab[complete.cases(geneLab), ]'
[ ] NOTE: Functional programming: code repetition.
- repetition in clustering and cvConductance
  - in clustering
    - line 114: if (method == "relativeGap") {
    - line 115: krelativeGap <- k$relativeGap
    - line 116: Yf <- Y[, seq_len(min(2 * krelativeGap, ncol(Y)))]
    - line 117: Xf <- X[seq_len(min(2 * krelativeGap, length(X)))]
    - line 118: Yf <- divideNorm(Yf, rowWise = TRUE)
    - line 119: clusf <- kmeans(Yf, krelativeGap, iter.max = maxIter,
    - line 120: nstart = numStart)
    - line 121: conf <- conductance(adja = adjaMat, clusLab = clusf$cluster)
    - line 131: ksecondOrderGap <- k$secondOrderGap
    - line 132: Ys <- Y[, seq_len(min(2 * ksecondOrderGap, ncol(Y)))]
    - line 133: Xs <- X[seq_len(min(2 * ksecondOrderGap, length(X)))]
    - line 134: Ys <- divideNorm(Ys, rowWise = TRUE)
    - line 135: cluss <- kmeans(Ys, ksecondOrderGap, iter.max = maxIter,
    - line 136: nstart = numStart)
    - line 137: cons <- conductance(adja = adjaMat, clusLab = cluss$cluster)
    - line 147: kadditiveGap <- k$additiveGap
    - line 148: Yg <- Y[, seq_len(min(2 * kadditiveGap, ncol(Y)))]
    - line 149: Xg <- X[seq_len(min(2 * kadditiveGap, length(X)))]
    - line 150: Yg <- divideNorm(Yg, rowWise = TRUE)
    - line 151: clusg <- kmeans(Yg, kadditiveGap, iter.max = maxIter,
    - line 152: nstart = numStart)
    - line 153: cong <- conductance(adja = adjaMat, clusLab = clusg$cluster)
  - in cvConductance
    - line 3: message("Conductance Validation...")
    - line 4: krelativeGap <- k$relativeGap
    - line 5: Yf <- Y[, seq_len(min(2 * krelativeGap, ncol(Y)))]
    - line 6: Xf <- X[seq_len(min(2 * krelativeGap, length(X)))]
    - line 7: Yf <- divideNorm(Yf, rowWise = TRUE)
    - line 8: clusf <- kmeans(Yf, krelativeGap, iter.max = maxIter, nstart = numStart)
    - line 10: ksecondOrderGap <- k$secondOrderGap
    - line 11: Ys <- Y[, seq_len(min(2 * ksecondOrderGap, ncol(Y)))]
    - line 12: Xs <- X[seq_len(min(2 * ksecondOrderGap, length(X)))]
    - line 13: Ys <- divideNorm(Ys, rowWise = TRUE)
    - line 14: cluss <- kmeans(Ys, ksecondOrderGap, iter.max = maxIter,
    - line 15: nstart = numStart)
    - line 16: cons <- conductance(adja = adja, clusLab = cluss$cluster)
    - line 17: kadditiveGap <- k$additiveGap
    - line 18: Yg <- Y[, seq_len(min(2 * kadditiveGap, ncol(Y)))]
    - line 19: Xg <- X[seq_len(min(2 * kadditiveGap, length(X)))]
    - line 20: Yg <- divideNorm(Yg, rowWise = TRUE)
    - line 21: clusg <- kmeans(Yg, kadditiveGap, iter.max = maxIter, nstart = numStart)
- repetition in clustering and ezSGCP
  - in clustering
    - line 34: }
    - line 35: if (!is.null(kopt) && kopt != round(kopt)) {
    - line 36: warning("kopt must be either null or an integer", call. = FALSE)
    - line 37: message("making kopt null")
    - line 38: kopt <- NULL
    - line 39: }
    - line 40: if (length(setdiff(method, c("relativeGap", "secondOrderGap",
    - line 41: "additiveGap"))) != 0) {
    - line 42: warning("method can be either relativeGap, secondOrderGag, or additiveGap",
    - line 43: call. = FALSE)
    - line 44: message("making method to NULL")
    - line 45: method <- NULL
    - line 46: }
    - line 47: if (!is.numeric(maxIter) || !is.numeric(numStart)) {
    - line 48: warning("maxIter and numStart must be numeric and integer",
  - in ezSGCP
    - line 27: }
    - line 28: if (!is.null(kopt) && kopt != round(kopt)) {
    - line 29: warning("kopt must be either null or an integer")
    - line 30: message("making k null")
    - line 31: kopt <- NULL
    - line 32: }
    - line 33: if (length(setdiff(method_k, c("relativeGap", "secondOrderGap",
    - line 34: "additiveGap"))) != 0) {
    - line 35: warning("method_k can be either relativeGap, secondOrderGap, or additiveGap",
    - line 36: call. = FALSE)
    - line 37: message("making method to NULL")
    - line 38: method_k <- NULL
    - line 39: }
    - line 40: if (!is.numeric(maxIteration) || !is.numeric(numberStart)) {
    - line 41: warning("maxIteration and numStart must be numeric and integer")
- repetition in clustering and sigClusGO and cvConductance
  - in clustering
    - line 167: Yopt <- Y[, seq_len(min(2 * kopt, ncol(Y)))]
    - line 168: Xopt <- X[seq_len(min(2 * kopt, length(X)))]
    - line 169: Yopt <- divideNorm(Yopt, rowWise = TRUE)
    - line 170: clusopt <- kmeans(Yopt, kopt, iter.max = maxIter, nstart = numStart)
    - line 171: conopt <- conductance(adja = adjaMat, clusLab = clusopt$cluster)
  - in sigClusGO
    - line 3: Yt <- Y[, seq_len(min(2 * k, ncol(Y)))]
    - line 4: Xt <- X[seq_len(min(2 * k, length(X)))]
    - line 5: Yt <- divideNorm(Yt, rowWise = TRUE)
    - line 6: clust <- kmeans(Yt, k, iter.max = maxIter, nstart = numStart)
    - line 7: cont <- conductance(adja = adja, clusLab = clust$cluster)
  - in cvConductance
    - line 6: Xf <- X[seq_len(min(2 * krelativeGap, length(X)))]
    - line 7: Yf <- divideNorm(Yf, rowWise = TRUE)
    - line 8: clusf <- kmeans(Yf, krelativeGap, iter.max = maxIter, nstart = numStart)
    - line 9: conf <- conductance(adja = adja, clusLab = clusf$cluster)
    - line 19: Xg <- X[seq_len(min(2 * kadditiveGap, length(X)))]
    - line 20: Yg <- divideNorm(Yg, rowWise = TRUE)
    - line 21: clusg <- kmeans(Yg, kadditiveGap, iter.max = maxIter, nstart = numStart)
    - line 22: cong <- conductance(adja = adja, clusLab = clusg$cluster)
- repetition in DOM and TOM
  - in DOM
    - line 1:{
    - line 2: diag(mat) <- 0
    - line 3: degreeRow <- replicate(dim(mat)[1], rowSums(mat))
    - line 4: degreeCol <- t(replicate(dim(mat)[1], colSums(mat)))
    - line 5: degreeMin <- pmin(degreeRow, degreeCol)
    - line 6: rm(degreeRow, degreeCol)
    - line 7: degreeRow <- replicate(dim(mat)[1], rowSums(mat)^2)
    - line 8: degreeCol <- t(replicate(dim(mat)[1], colSums(mat)^2))
    - line 9: degreeMin2 <- pmin(degreeRow, degreeCol)
    - line 10: numerator <- mat + (mat %^% 2) + (mat %^% 3)
    - line 11: denominator <- degreeMin2 + degreeMin + (1 - mat)
    - line 12: res <- numerator/denominator
    - line 13: diag(res) <- 1
    - line 14: rm(degreeCol, degreeRow, degreeMin, degreeMin2)
    - line 15: return(as.matrix(res))
  - in TOM
    - line 1:{
    - line 2: diag(mat) <- 0
    - line 3: degreeRow <- replicate(dim(mat)[1], rowSums(mat))
    - line 4: degreeCol <- t(replicate(dim(mat)[1], colSums(mat)))
    - line 5: degreeMin <- pmin(degreeRow, degreeCol)
    - line 6: numerator <- (mat %^% 2) + mat
    - line 7: denominator <- degreeMin + (1 - mat)
    - line 8: res <- numerator/denominator
    - line 9: diag(res) <- 1
    - line 10: rm(degreeCol, degreeRow, degreeMin)
    - line 11: return(as.matrix(res))
- repetition in ezSGCP and geneOntology
  - in ezSGCP
    - line 45: }
    - line 46: if (all(dir %!in% c("under", "over"))) {
    - line 47: warning("dir must be in c(under or over) \n making to default",
    - line 48: call. = FALSE)
    - line 49: dir <- c("over", "under")
    - line 50: }
    - line 51: if (length(dir) > 2) {
    - line 52: warning("dir must be in c(under or over) \n making to default",
    - line 53: call. = FALSE)
    - line 54: dir <- c("over", "under")
    - line 55: }
    - line 56: if (all(onto %!in% c("BP", "CC", "MF"))) {
    - line 57: warning(" onto must be in BP CC MF \n making to default",
    - line 58: call. = FALSE)
    - line 59: onto <- c("BP", "CC", "MF")
    - line 60: }
    - line 61: if (length(onto) > 3) {
    - line 62: warning(" onto must be in BP CC MF \n making to default",
    - line 63: call. = FALSE)
    - line 64: onto <- c("BP", "CC", "MF")
    - line 65: }
    - line 66: if (!is.null(hgCut) && (hgCut >= 1 || hgCut <= 0)) {
    - line 67: warning(" not correct hgCutoff value \n making to default",
    - line 68: call. = FALSE)
    - line 69: }
    - line 70: if (condTest != TRUE && condTest != FALSE) {
    - line 71: warning(" condTest must be boolean! \n making to deafult",
    - line 72: call. = FALSE)
    - line 73: condTest <- TRUE
    - line 74: }
  - in geneOntology
    - line 3:{
    - line 4: if (all(direction %!in% c("under", "over"))) {
    - line 5: warning("direction must be in c(under or over) \n making to default",
    - line 6: call. = FALSE)
    - line 7: direction <- c("over", "under")
    - line 8: }
    - line 9: if (length(direction) > 2) {
    - line 10: warning("direction must be in c(under or over) \n making to default",
    - line 11: call. = FALSE)
    - line 12: direction <- c("over", "under")
    - line 13: }
    - line 14: if (all(ontology %!in% c("BP", "CC", "MF"))) {
    - line 15: warning(" ontology must be in BP CC MF \n making to default",
    - line 16: call. = FALSE)
    - line 17: ontology <- c("BP", "CC", "MF")
    - line 18: }
    - line 19: if (length(ontology) > 3) {
    - line 20: warning(" ontology must be in BP CC MF \n making to default",
    - line 21: call. = FALSE)
    - line 22: ontology <- c("BP", "CC", "MF")
    - line 23: }
    - line 24: if (!is.null(hgCutoff) && (hgCutoff >= 1 || hgCutoff <= 0)) {
    - line 25: warning(" not correct hgCutoff value \n making to default",
    - line 26: call. = FALSE)
    - line 27: }
    - line 28: if (cond != TRUE && cond != FALSE) {
    - line 29: warning(" cond must be boolean! \n making to deafult",
    - line 30: call. = FALSE)
    - line 31: cond <- TRUE
    - line 32: }
- repetition in ezSGCP and semiLabeling
  - in ezSGCP
    - line 74: }
    - line 75: if (percent >= 1 || percent <= 0) {
    - line 76: warning("percent must be in (0,1) \n making percent to default",
    - line 77: call. = FALSE)
    - line 78: percent <- 0.1
    - line 79: }
    - line 80: if (stp >= 1 || stp <= 0) {
    - line 81: warning("step must be in (0,1) \n making stp to default",
    - line 82: call. = FALSE)
    - line 83: stp <- 0.01
    - line 84: }
  - in semiLabeling
    - line 11: }
    - line 12: if (percent >= 1 || percent <= 0) {
    - line 13: warning("percent must be in (0,1) \n making percent to default",
    - line 14: call. = FALSE)
    - line 15: percent <- 0.1
    - line 16: }
    - line 17: if (stp >= 1 || stp <= 0) {
    - line 18: warning("stp must be in (0,1) \n making percent to default",
    - line 19: call. = FALSE)
    - line 20: stp <- 0.01
    - line 21: }
- repetition in ezSGCP and semiSupervised
  - in ezSGCP
    - line 84: }
    - line 85: if (!is.null(model) && model != "knn" & model != "lr") {
    - line 86: warning("model must be either NULL, knn, or lr \n setting to knn",
    - line 87: call. = FALSE)
    - line 88: model <- "knn"
  - in semiSupervised
    - line 5: }
    - line 6: if (!is.null(model) & model != "knn" & model != "lr") {
    - line 7: warning("model must be either NULL, knn, or lr \n setting to knn")
    - line 8: model <- "knn"
    - line 9: }
- repetition in GeneOfGOTerm and GOenrichment
  - in GeneOfGOTerm
    - line 15: labeledGenes <- c(labeledGenes, temp)
    - line 16: }
    - line 17: labeledGenes <- labeledGenes[-1]
    - line 18: newList <- list(labeledGenes = labeledGenes, GOTermGenes = GOTermGenes)
  - in GOenrichment
    - line 91: labeledGenes <- labeledGenes[-1]
    - line 92: labeledGenes <- unique(labeledGenes)
    - line 93: }
    - line 94: newList <- list(labeledGenes = labeledGenes, GOTermGenes = GOTermGenes,
- repetition in geneOntology and GOenrichment
  - in geneOntology
    - line 17: ontology <- c("BP", "CC", "MF")
    - line 18: }
    - line 19: if (length(ontology) > 3) {
    - line 20: warning(" ontology must be in BP CC MF \n making to default",
    - line 22: ontology <- c("BP", "CC", "MF")
    - line 23: }
    - line 24: if (!is.null(hgCutoff) && (hgCutoff >= 1 || hgCutoff <= 0)) {
    - line 25: warning(" not correct hgCutoff value \n making to default",
  - in GOenrichment
    - line 32: ontology <- c("BP", "CC", "MF")
    - line 33: }
    - line 34: if (length(ontology) > 3) {
    - line 35: warning(" ontology must be in BP CC MF", call. = FALSE)
    - line 37: ontology <- c("BP", "CC", "MF")
    - line 38: }
    - line 39: if (!is.null(hgCutoff) && (hgCutoff >= 1 || hgCutoff <= 0)) {
    - line 40: warning(" not correct hgCutoff value", call. = FALSE)
[ ] NOTE: Functional programming: code repetition Type 2. In function df2mat you already removed the colnames and rownames, but you call remove them again at line 158-159.

Documentation

[ ] Important: Please include Bioconductor installation instructions using BiocManager.
- rmd file vignettes/SGCP.Rmd
[ ] Note: Vignette includes motivation for submitting to Bioconductor as part of the abstract/intro of the main vignette.
- rmd file vignettes/SGCP.Rmd

na396 commented 1 year ago

@jianhong Thank you so much for the comments.

in line 18 import("org.Hs.eg.db") => I need to pass this object to GOstat function in line 19 import("ggplot2") => I have used many functions of ggplot for the aim of visualization. in line 20 import("expm") => I need to import the operation ^ for matrix powering. in line 21 import("dplyr") => I have used plenty functions of dplyr library fir dataframe related tasks. in line 22 import("GO.db") in line 23 import(annotate, except=c(toFile)) in line 24 import("genefilter") in line 25 import("GOstats") in line 26 import("RColorBrewer") in line 27 import("xtable") in line 28 import("Rgraphviz") in line 29 import("reshape2") => fixed in line 30 import("openxlsx") => fixed in line 32 import("caret") => fixed

In general SGCP highly depends on ggplot, dplyr, caret, and GOstats packages "GO.db", "annotate", "RColorBrewer", "genefilter" are the dependencies of GOstats. When I installed the GOstats for myself, the dependencies were not installed. After multiple attempts, I installed the dependencies manually and then GOstats package. And this is the reason I imported these libraries. The remaining are fixed.

NOTE: Consider adding the maintainer's ORCID iD in 'Authors@R' with 'comment=c(ORCID="...")' => Fixed

NOTE: Consider adding unit tests. We strongly encourage them. See https://contributions.bioconductor.org/tests.html => this package works with big data, and its a pipeline for series of step on large dataset. Each step by itself has many parameter that may result in different solutions. Additionally, each step may take up to hours to run that violates the time limit requirement for the Bioconductor. Moreover, each step does not have a deterministic solution. This pipeline has randomness in each step.

NOTE: no direct slot access with @ or slot() - accessors implemented and used. Please ask help form HyperGResult-accessors => I'm not sure if understand it correctly, But, in "GO_Genes <- hg@goDag@nodeData@data'", hg is an object returned by hyperGTest function in GOstats package, and at this stage, SGCP try to retrieve some information from this object. Please guide me if I need to change it.

important: No paste in message(), message, stop => the first two are fixed. For caption_sym <- paste0(" output of ", stp, " , is not symmetric")' I use it in the next syntax which is stop(caption_sym). I used the paste command, because this function is for error detection and is used in multiple stage, with paste function I can make dynamic that the stop syntax tells me where the error has happened.

NOTE: :: is not suggested in source code unless you can make sure all the packages are imported. => Fixed

NOTE: Vectorize: for loops present, try to replace them by *apply functions. => for loops does not have a regular pattern or structure, depends on the cluster size and shape, it may be different . In side each iteration, many steps are taken and none of these has a regular structure. Throughout this package, everything is implemented vectorized except these three loops that I was not able to come up with vectorized implementation.

Important: Remove unused code. => Fixed

NOTE: Avoid using '=' for assignment and use '<-' instead => Fixed.

Important: Please consider to add drop=FALSE to avoid the reduction of dimension for matrices and arrays. => The pipeline at these stages, actually, needs to reduce the dimension. This is the target of these steps.

NOTE: Functional programming: code repetition. => Although it seems that these syntax are repetition, they are not the same. Each are performed for different purpose and need to be performed. Some of them also let me track down the code easier if bugs report in future. Some of them also are repeated in different functions. Because those functions can be used dependently or independently. Therefore, some statements are needed to be checked in both for case the functions are used independently. For instance, in the begining of two functions ezSGCP and geneOntology it is checked that the dir is in c("under", "over"). Because ezSGCP is a wrapper of multiple functions including geneOntology and geneOntology function also can be applied independently. Therefore, in the beginning of each function I have checked if the this statement is valid. This acutally helps me to better maintain the package.

Important: Please include Bioconductor installation instructions using BiocManager. => Fixed

Note: Vignette includes motivation for submitting to Bioconductor as part of the abstract/intro of the main vignette. => I'm not sure if I understant correctly, I have added the information of the package installation through the BiocManager

Important: Please include Bioconductor installation instructions using BiocManager. => fixed

na396 commented 1 year ago

I'm pushing the modification into the repository.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 4f49cf88e9e4165c6d5fdbe19b2f11ef4b7d9dc4

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 4b56cad8f0dd8556be1b6f30844f6c6b76969c60

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

jianhong commented 1 year ago

Is it possible to rewrite GO_Genes <- hg@goDag@nodeData@data by GO_genes <- graph::nodeData(GOstats::goDag(hg))?
Please move back the BiocManager::install section into your vignettes.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 3a43d22b29a0c3f21b3e179f7913aacbee8b7af6

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

na396 commented 1 year ago

@jianhong Is it possible to rewrite GO_Genes <- hg@goDag@nodeData@data by GO_genes <- graph::nodeData(GOstats::goDag(hg)) =< Done

Please move back the BiocManager::install section into your vignettes. => Done

jianhong commented 1 year ago

I think there is mis-communication about the BiocManager::install section. I mean please show the code

BiocManager::install('SGCP')

in your vignettes.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: c84b43e7b909dfdc811fd14f539268a7eb88252a

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

na396 commented 1 year ago

@jianhong I added the installation to the vignettes, but this cause the following error.

ERROR: Installation calls found in vignette(s)

jianhong commented 1 year ago

OK, try

```{r, eval=FALSE}
library(BiocManager)
BiocManager::install(c('SGCP', 'SummarizedExperiment', 'org.Hs.eg.db'))
```

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 365fce8e1f9dfae7b7d8365199553d538df4c61b

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 73b69c43e4440bfa5055b2c6c363077970cfdeac

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SGCP to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

na396 commented 1 year ago

adding the installation causes the following warnings on the macOS WARNING: R CMD check exceeded 10 min requirement

Bioconductor / Contributions

SGCP #2840

Package 'SGCP' Review

The NAMESPACE file

General package development

R code

Documentation