Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

seqArchR #2572

Closed snikumbh closed 2 years ago

snikumbh commented 2 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 years ago

Hi @snikumbh

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Type: Package
Package: seqArchR
Title: Identify Different Architectures of Sequence Elements
Version: 0.99.0
Authors@R: 
    person("Sarvesh", "Nikumbh", email = "sarvesh.nikumbh@gmail.com", 
    role = c("aut", "cre", "cph"), 
    comment = c(ORCID = "0000-0003-3163-4447", 
    Twitter = "@sarveshnikumbh"))
Description: \code{seqArchR} enables unsupervised discovery of _de novo_ clusters 
    with characteristic sequence architectures characterized by 
    position-specific motifs or composition of stretches of nucleotides, 
    e.g., CG-richness. \code{seqArchR} does _not_ require any specifications 
    w.r.t. the number of clusters, the length of any individual motifs, or 
    the distance between motifs if and when they occur in pairs/groups; it 
    directly detects them from the data. \code{seqArchR} uses non-negative 
    matrix factorization (NMF) as its backbone, and employs a chunking-based 
    iterative procedure that enables processing of large sequence collections 
    efficiently. Wrapper functions are provided for visualizing cluster 
    architectures as sequence logos.
License: GPL-3 | file LICENSE
URL: https://snikumbh.github.io/seqArchR/,
    https://github.com/snikumbh/seqArchR
BugReports: https://github.com/snikumbh/seqArchR/issues
SystemRequirements: Python (>= 3.5), scikit-learn (>= 0.21.2)
Depends:
    R (>= 4.1.0)
Imports:
    utils,
    graphics,
    cvTools (>= 0.3.2),
    MASS,
    Matrix,
    methods,
    stats,
    cluster,
    matrixStats,
    fpc,
    cli,
    prettyunits,
    reshape2 (>= 1.4.3),
    reticulate (>= 1.22),
    parallel,
    Biostrings,
    grDevices,
    ggplot2 (>= 3.1.1),
    ggseqlogo (>= 0.1)
Suggests:
    TFBSTools,
    cowplot,
    hopach (>= 2.42.0),
    knitr (>= 1.22),
    rmarkdown (>= 1.12),
    testthat (>= 3.0.2),
    covr,
    vdiffr (>= 0.3.0)
VignetteBuilder: 
    knitr
biocViews: 
    MotifDiscovery, 
    GeneRegulation, 
    MathematicalBiology, 
    SystemsBiology,
    Transcriptomics,
    Genetics,
    Clustering,
    DimensionReduction,
    FeatureExtraction,
    DNASeq
Encoding: UTF-8
LazyData: false
RoxygenNote: 7.1.2
bioc-issue-bot commented 2 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/seqArchR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 05a2c04a2f70d216173f278be8c2235168b8a34e

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/seqArchR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 6d3015d2c58cb8d8427a076310ec37b8d0e3a3e5

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "UNSUPPORTED, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/seqArchR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: af047d4a6762b26002d5d9a5be6c3d55cbdfaca2

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/seqArchR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: ecaa00c9968d9d4e05146cc8418bfc9eaaa1de5b

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/seqArchR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: a64b98d05e5e282087b0407b2248229f1f5d57e5

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/seqArchR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

snikumbh commented 2 years ago

Hi @jianhong

I think this is good to go. Look forward to your review. Thanks in advance.

jianhong commented 2 years ago

Package 'seqArchR' Review

The package passed check and build. It is in pretty good shape. However there are several things need to be fixed.

The DESCRIPTION file

The NAMESPACE file

unacceptable files

R code

Documentation

snikumbh commented 2 years ago

Hi @jianhong,

Thanks for taking the time to review. I am addressing the points you have raised. I will ping again when I am done. Looking forward to meeting the deadline for the upcoming release.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: b30ffd9a76151dc1bd43111bf426a2f54c3fab80

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/seqArchR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

snikumbh commented 2 years ago

Hi @jianhong

I have addressed all the points you raised. They are answered in detail below. Look forward to hearing from you again. Thanks again for taking the time to review.

Thanks and best, Sarvesh

The DESCRIPTION file -- DONE

R version should be no less than 4.2

The NAMESPACE file -- DONE

   Selective imports using importFrom instead of import all with import.
    in line 25 import(cli)
    in line 26 import(ggplot2)
    in line 27 import(ggseqlogo)

unacceptable files -- DONE (removed)

unacceptable files present (see [.gitignore](https://contributions.bioconductor.org/gitignore.html) for a listing).
    ./index.md

R code -- DONE

no direct slot access with @ or slot() - accessors implemented and used.

In file R/seqArchR_assertions.R:
    at line 568 found ' tmp <- seqArchRresultObj$rawSeqs[1]@ranges@width'
In file R/seqArchR_auxiliary_functionsII.R:
    at line 338 found ' stopifnot(hopachDistMat@Size == nrow(factorsMat))'

:: is not suggested in source code unless you can make sure all the packages are imported. Please note that you need to manully double check the import items when you make any change in the DESCRIPTION file during development.

NOTE: I have retained the usage of :: because it adds to the readability and saves time when debugging or writing additional code (personal experience). I have manually checked all imports items.

Vectorize: no unnecessary for loops present. -- DONE. Where ideal, these have been addressed. See detail responses for individual cases below.

image

Use file.path to replace paste. I am not understand why you need paste / here. -- DONE. This was indeed unnecessary.

In file R/seqArchR_auxiliary_functionsII.R:
    at line 37 found ' retVal <- dir.create(paste0(o_dir, "/"), showWarnings = TRUE)'

Remove unused code. -- DONE. All removed.

In file R/NMF_model_selection_functions.R:
    at line 198 found ' # parallelDo = FALSE,'
    at line 208 found ' #aBase=param_ranges$alphaBase, aPow=param_ranges$alphaPow,'
    at line 222 found ' # if(parallelDo){'
    at line 223 found ' # q2_vals <- unlist(parallel::clusterApplyLB(cl = NULL,'
    at line 224 found ' # seq_len(nrow(grid_search_params)), function(i) {'
    at line 225 found ' # .get_q2_using_py( grid_search_params[i,] )'
    at line 226 found ' # }))'
    at line 227 found ' # }else{'
    at line 228 found ' # q2_vals <- unlist('
    at line 229 found ' # lapply(seq_len(nrow(grid_search_params)),'
    at line 230 found ' # function(i) {'
    at line 231 found ' # .get_q2_using_py_serial( grid_search_params[i,],'
    at line 232 found ' # X = X, cvfolds = cvfolds)'
    at line 233 found ' # }))'
    at line 234 found ' # }'
    at line 287 found ' # parallelDo = FALSE,'
    at line 288 found ' # nCores = NA,'
    at line 294 found ' # monolinear = FALSE,'
    at line 322 found ' # if (parallelDo) {'
    at line 326 found ' # if(parallelDo){'
    at line 327 found ' # cl <- .setup_par_cluster(vlist='
    at line 328 found ' # c(".get_q2_using_py", ".compute_q2", "X", "cvfolds"))'
    at line 329 found ' # }'
    at line 333 found ' # .msg_pstr("Coarse-fine grained binary search", flg=vrbs)'
    at line 335 found ' #go_fine <- FALSE'
    at line 338 found ' #eureka <- FALSE'
    at line 341 found ' #coarse_step <- 10'
    at line 359 found ' # parallelDo = parallelDo,'
    at line 376 found ' ## fgIL to (mi-5)'
    at line 380 found ' #go_fine <- TRUE'
    at line 388 found ' # idx_best <- as.numeric(which.max('
    at line 389 found ' # unlist(coarse_prev_df["q2_vals"])))'
    at line 390 found ' # threshold <- coarse_prev_df[idx_best, "q2"] -'
    at line 417 found ' ## go fine over interval (hi[kCGIdx]+1 , lo[kCGIdx])'
    at line 433 found ' ## best_K is == hi, go to next coarse-grained iteration'
    at line 449 found ' # parallelDo = parallelDo,'
    at line 458 found ' # minKInDF <- min(as.numeric(unlist(combined_df["k_vals"])))'
    at line 481 found ' # parallelDo = parallelDo,'
    at line 485 found ' # temp_best_K <- searchReturnFine$best_K'
    at line 488 found ' # combined_df <- rbind(combined_df, fine_prev_df)'
    at line 493 found ' # minKInDF <- min(as.numeric('
    at line 494 found ' # unlist(combined_df["k_vals"])))'
    at line 495 found ' # message("MIN_K_IN_DF: ", minKInDF)'
    at line 504 found ' # }'
    at line 599 found ' # check_par_conditions(nCores=nCores)'
    at line 773 found ' title = "Reconstruction accuracy, Q\U00B2 = f(#Factors)",'
    at line 774 found ' x = "#Factors (K)",'
In file R/NMF_model_selection_functionsII.R:
    at line 16 found ' # tol = 10^-3,'
In file R/plot_arch_for_clusters.R:
    at line 214 found ' # nPos <- length(pos_lab)'
    at line 215 found ' # xtick_cal <- seq(0, nPos, by = xt_freq)'
    at line 216 found ' # xtick_cal[1] <- 1'
    at line 217 found ' # xtick_cal[length(xtick_cal)] <- nPos'
In file R/plot_ggheatmap.R:
    at line 65 found ' mid = "white", high = "#012345") +'
    at line 156 found ' # ggplot2::theme(axis.text.x = element_text(size = rel(0.9),'
    at line 157 found ' # angle = 90, hjust = 1),'
    at line 158 found ' # axis.text.y = element_text(size = rel(0.9)))'
In file R/prepare_data_from_FASTA.R:
    at line 29 found ' # colnames(one_hot_encoded) <- paste(rep(dna_alphabet,'
    at line 30 found ' # each = seqlen), seq_len(seqlen), sep=".")'
    at line 234 found ' # length_vals <- unlist(lapply(seqs_split_as_list, length))'
In file R/seqArchR_assertions.R:
    at line 54 found ' # matElements <-'
    at line 55 found ' # if () {'
    at line 56 found ' # stop("")'
    at line 57 found ' # }'
    at line 95 found ' # if ((nrow(featuresMatrix) %% 4) != 0) {'
    at line 96 found ' # stop("#Rows in featuresMatrix not a multiple of 4")'
    at line 97 found ' # }'
    at line 456 found ' stop("In NMF result, #clusters != #factors")'
In file R/seqArchR_auxiliary_functionsI.R:
    at line 134 found ' # clust_list <- get_seqs_clust_list(res$seqsClustLabels[[iter]])'
    at line 140 found ' # clust_list <- get_seqs_clust_list(res$seqsClustLabels[[iter]])'
    at line 186 found ' # new_mem[which(old_mem == i)] <- 1'
    at line 279 found ' # out_clust_range <- NULL'
    at line 301 found ' # left_out <- setdiff(qual_cl_idx, union(out_clust_range, out_clust_iqr))'
    at line 302 found ' # if(length(left_out) > 0){'
    at line 303 found ' # out_clust_size <- .compare_size(clustwise_matlist, qual_cl_idx)'
    at line 304 found ' # }'
    at line 307 found ' # return(NULL)'
    at line 1084 found ' # clust_list <- .detect_just_for_sake_clust(cheight_idx, clust_list,'
    at line 1085 found ' # vrbs=verbose)'
    at line 1216 found ' # useMinClusters <- keepMinClusters(set_ocollation, temp_res,'
    at line 1217 found ' # totOuterChunksColl ='
    at line 1218 found ' # totOuterChunksColl, dbg = dbg,'
    at line 1219 found ' # nClustEachIC = nClustEachIC,'
    at line 1220 found ' # test_itr = test_itr -1,'
    at line 1221 found ' # stage="Final")'
In file R/seqArchR_auxiliary_functionsII.R:
    at line 20 found ' # .msg_pstr("-- Directory exists: -- ", o_dir,'
    at line 21 found ' # "-- Changing name to: -- ", flg=vrbs)'
    at line 230 found ' # tol = tol,'
    at line 364 found ' # factorsMatList_as2D <- lapply(seq_len(ncol(factorsMat)),'
    at line 365 found ' # function(x){matrix(factorsMat[,x],'
    at line 366 found ' # nrow = nrow(factorsMat)/nPositions,'
    at line 367 found ' # byrow = TRUE,'
    at line 368 found ' # dimnames = list(dim_names))'
    at line 369 found ' # })'
    at line 371 found ' # factorsMatList_asPFMs <- lapply(seq_len(length(factorsMatList_as2D)),'
    at line 372 found ' # function(x){'
    at line 373 found ' # sinucSparse <- collapse_into_sinuc_matrix('
    at line 374 found ' # given_feature_mat = as.matrix(factorsMat[,x]),'
    at line 375 found ' # dinuc_mat = factorsMatList_as2D[[x]],'
    at line 376 found ' # feature_names = dim_names)'
    at line 377 found ' # sinucSparseInt <- matrix(as.integer(round(sinucSparse)),'
    at line 378 found ' # nrow = 4, byrow = FALSE,'
    at line 379 found ' # dimnames = list(rownames(sinucSparse)))'
    at line 380 found ' # })'
    at line 405 found ' # relScoresMat[i,j] <- temp["relScore"]'
    at line 478 found ' # parallelDo = config$parallelize, nCores = config$nCoresUse,'
    at line 491 found ' # parallelDo = config$parallelize, nCores = config$nCoresUse,'
    at line 513 found ' # .msg_pstr("Fetching ", best_k, " clusters", flg=(vrbs || dbg))'
    at line 521 found ' # parallelDo = config$parallelize,'
    at line 522 found ' # nCores = config$nCoresUse,'
    at line 534 found ' ##A <- this_mat[, new_ord[[nR]]]'
    at line 546 found ' # .msg_pstr("Best Q2 giving run found: ", bestQ2, flg=dbg)'
    at line 547 found ' # cli::cli_alert_info("Fetched {best_k} clusters")'
    at line 557 found ' # .msg_pstr("Fetching ", best_k," cluster(s)", flg=dbg)'
    at line 842 found ' # if(parallelize){'
    at line 843 found ' # cl <- parallel::makeCluster(crs, type = "FORK")'
    at line 844 found ' # parallel::setDefaultCluster(cl)'
    at line 845 found ' # cli::cli_alert_info("Parallelization: {crs} cores")'
    at line 846 found ' # }else{'
    at line 847 found ' # cl <- NA'
    at line 848 found ' # cli::cli_alert_info("Parallelization: No")'
    at line 849 found ' # }'
    at line 873 found ' # globFactors <- vector("list", length(innerChunksColl))'
    at line 874 found ' # globClustAssignments <- vector("list", length(innerChunksColl))'
    at line 875 found ' # nClustEachIC <- rep(0, length(innerChunksColl))'
In file R/seqArchR_main.R:
    at line 433 found ' # if(parallelize) parallel::stopCluster(setup_ans$cl)'
In file R/viz_matrix_of_acgt_image.R:
    at line 131 found ' # xtick_cal <- seq(0, nPos, by = xt_freq)'
    at line 132 found ' # xtick_cal[1] <- 1'
In file R/zzz.R:
    at line 4 found ' # reticulate::source_python(system.file('
    at line 6 found ' # package = "seqArchR",'
    at line 7 found ' # mustWork = TRUE'
    at line 8 found ' # ))'
    at line 9 found ' # reticulate::configure_environment("seqArchR")'
    at line 10 found ' # sklearn <<- reticulate::import("sklearn", delay_load = TRUE)'

Functional programming: code repetition. -- DONE. All repetitions cleared. This has made the plotting functions, where the majority of these existed, much more succint.

repetition in .assert_seqArchR_featuresMatrix and .assert_seqArchR_samplesMatrix
    in .assert_seqArchR_featuresMatrix
        line 1: function (featuresMatrix)
        line 2: {
        line 3: check_ncols <- 0
        line 4: if (is.null(featuresMatrix)) {
        line 5: stop("NULL value found, instead of a matrix")
        line 6: }
        line 7: if (!is.matrix(featuresMatrix)) {
        line 8: stop("Expected a matrix, found otherwise")
        line 9: }
        line 10: else {
        line 13: }
        line 14: if (ncol(featuresMatrix) < 1) {
        line 15: stop("0 columns (sequences) in samplesMatrix")
        line 16: }
        line 17: if (ncol(featuresMatrix) == check_ncols) {
        line 18: stop("Check matrix, 'ncols' is: ", check_ncols)
    in .assert_seqArchR_samplesMatrix
        line 1: function (samplesMatrix)
        line 2: {
        line 3: check_nrows <- 0
        line 4: if (is.null(samplesMatrix)) {
        line 5: stop("NULL value found, instead of a matrix")
        line 6: }
        line 7: if (!is.matrix(samplesMatrix)) {
        line 8: stop("Expected a matrix, found otherwise")
        line 9: }
        line 10: else {
        line 11: if (ncol(samplesMatrix) < 1) {
        line 12: stop("0 columns (sequences) in samplesMatrix")
        line 13: }
        line 14: if (nrow(samplesMatrix) == check_nrows) {
        line 15: stop("Check matrix, nrows == ", check_nrows)
repetition in .assert_seqArchR_kFolds_in_tandem and .assert_seqArchR_kFolds_independent and .assert_seqArchR_nRuns
    in .assert_seqArchR_kFolds_in_tandem
        line 2: {
        line 3: if (is.null(kFolds_var)) {
        line 4: stop("'kFolds' is NULL")
        line 5: }
        line 6: if (!is.numeric(kFolds_var)) {
        line 7: stop("'kFolds' should be numeric and > 0")
        line 8: }
        line 9: else {
        line 10: if (kFolds_var < 1) {
        line 11: stop("'kFolds' should be > 0")
        line 12: }
    in .assert_seqArchR_kFolds_independent
        line 2: {
        line 3: if (is.null(kFolds_var)) {
        line 4: stop("'kFolds' is NULL")
        line 5: }
        line 6: if (!is.numeric(kFolds_var)) {
        line 7: stop("'kFolds' should be numeric and > 0")
        line 8: }
        line 9: else {
        line 10: if (kFolds_var < 1) {
        line 11: stop("'kFolds' should be > 0")
        line 12: }
        line 13: }
    in .assert_seqArchR_nRuns
        line 3: stop("'n_runs' is NULL")
        line 4: }
        line 5: if (!is.numeric(nIter_var)) {
        line 6: stop("'n_runs' should be numeric and > 0")
        line 7: }
        line 8: else {
repetition in .compare_iqr and .compare_range
    in .compare_iqr
        line 1: qual_cl_idx, zscore_thresh = 5)
        line 2:{
        line 3: ncl <- ncol(clustwise_matlist[[1]])
        line 4: all_iqr <- lapply(clustwise_matlist, function(x) {
        line 9: iqr_zscore <- (all_iqr_vec - stats::median(all_iqr_vec))/all_iqr_mad
        line 10: out_idx <- which(iqr_zscore > zscore_thresh)
        line 11: if (length(out_idx) > 0) {
        line 12: clust_id <- ceiling(out_idx/ncl)
        line 13: return(intersect(clust_id, qual_cl_idx))
        line 14: }
        line 15: else {
        line 16: }
        line 17: return(NULL)
    in .compare_range
        line 1: qual_cl_idx, zscore_thresh = 5)
        line 2:{
        line 3: ncl <- ncol(clustwise_matlist[[1]])
        line 4: all_range <- lapply(clustwise_matlist, function(x) {
        line 10: range_zscore <- (all_range_vec - stats::median(all_range_vec))/all_range_mad
        line 11: out_idx <- which(range_zscore > zscore_thresh)
        line 12: if (length(out_idx) > 0) {
        line 13: clust_id <- ceiling(out_idx/ncl)
        line 14: return(intersect(clust_id, qual_cl_idx))
        line 15: }
        line 16: else {
        line 17: }
        line 18: return(NULL)
repetition in .get_q2_using_py and .get_q2_using_py_serial
    in .get_q2_using_py
        line 1: cvfolds, X)
        line 2:{
        line 3: this_k <- as.numeric(x["k_vals"])
        line 4: this_alpha <- as.numeric(x["alpha"])
        line 5: this_seed <- as.numeric(x["seed_val"])
        line 6: test_fold <- as.numeric(x["fold"])
        line 7: train_rows <- cvfolds$cvf_rows$subsets[cvfolds$cvf_rows$which !=
        line 8: test_fold]
        line 9: train_cols <- cvfolds$cvf_cols$subsets[cvfolds$cvf_cols$which !=
        line 10: test_fold]
        line 11: test_rows <- cvfolds$cvf_rows$subsets[cvfolds$cvf_rows$which ==
        line 12: test_fold]
        line 13: test_cols <- cvfolds$cvf_cols$subsets[cvfolds$cvf_cols$which ==
        line 14: test_fold]
        line 15: submatrixD <- X[train_rows, train_cols]
        line 16: submatrixA <- X[test_rows, test_cols]
        line 17: submatrixB <- X[test_rows, train_cols]
        line 18: submatrixC <- X[train_rows, test_cols]
        line 19: nmf_submatrixD_result <- .perform_single_NMF_run(Xmat = submatrixD,
        line 20: kVal = as.integer(this_k), alphaVal = this_alpha, seedVal = this_seed)
        line 21: D_W <- nmf_submatrixD_result$featuresMatrix
        line 22: D_H <- nmf_submatrixD_result$samplesMatrix
        line 23: reconstructed_submatrixA <- as.matrix(submatrixB) %% MASS::ginv(D_H) %%
        line 24: MASS::ginv(D_W) %*% as.matrix(submatrixC)
        line 25: q2 <- .compute_q2(as.matrix(submatrixA), reconstructed_submatrixA)
        line 26: return(q2)
    in .get_q2_using_py_serial
        line 1: function (x, X, cvfolds)
        line 2: {
        line 3: this_k <- as.numeric(x["k_vals"])
        line 4: this_alpha <- as.numeric(x["alpha"])
        line 5: this_seed <- as.numeric(x["seed_val"])
        line 6: test_fold <- as.numeric(x["fold"])
        line 7: train_rows <- cvfolds$cvf_rows$subsets[cvfolds$cvf_rows$which !=
        line 8: test_fold]
        line 9: train_cols <- cvfolds$cvf_cols$subsets[cvfolds$cvf_cols$which !=
        line 10: test_fold]
        line 11: test_rows <- cvfolds$cvf_rows$subsets[cvfolds$cvf_rows$which ==
        line 12: test_fold]
        line 13: test_cols <- cvfolds$cvf_cols$subsets[cvfolds$cvf_cols$which ==
        line 14: test_fold]
        line 15: submatrixD <- X[train_rows, train_cols]
        line 16: submatrixA <- X[test_rows, test_cols]
        line 17: submatrixB <- X[test_rows, train_cols]
        line 18: submatrixC <- X[train_rows, test_cols]
        line 19: nmf_submatrixD_result <- .perform_single_NMF_run(Xmat = submatrixD,
        line 20: kVal = as.integer(this_k), alphaVal = this_alpha,
        line 22: D_W <- nmf_submatrixD_result$featuresMatrix
        line 23: D_H <- nmf_submatrixD_result$samplesMatrix
        line 24: reconstructed_submatrixA <- as.matrix(submatrixB) %*%
        line 25: MASS::ginv(D_H) %% MASS::ginv(D_W) %% as.matrix(submatrixC)
        line 26: q2 <- .compute_q2(as.matrix(submatrixA), reconstructed_submatrixA)
        line 27: return(q2)
repetition in .one_hot_encode_dinuc and .one_hot_encode_sinuc
    in .one_hot_encode_dinuc
        line 9: use_colnames <- .get_feat_names(alph = dna_alphabet, k = 2,
        line 10: seqlen = seqlen)
        line 11: if (seqlen > 0) {
        line 12: one_hot_encoded_dinuc_profile <- matrix(rep(0, length(dna_alphabet_dinuc) *
        line 13: seqlen), nrow = 1, byrow = TRUE)
        line 20: }
        line 21: else {
        line 22: stop("Empty or NULL found")
        line 23: }
    in .one_hot_encode_sinuc
        line 4: use_colnames <- .get_feat_names(alph = dna_alphabet, k = 1,
        line 5: seqlen = seqlen)
        line 6: if (seqlen > 0) {
        line 7: one_hot_encoded <- matrix(rep(0, length(dna_alphabet) *
        line 8: seqlen), nrow = 1, byrow = TRUE)
        line 15: }
        line 16: else {
        line 17: stop("Empty or NULL found")
        line 18: }
repetition in .one_hot_encode_dinuc and .one_hot_encode_trinuc
    in .one_hot_encode_dinuc
        line 1:{
        line 2: dna_alphabet <- c("A", "C", "G", "T")
        line 3: dna_alphabet_dinuc <- do.call(paste0, expand.grid(dna_alphabet,
        line 4: dna_alphabet))
        line 5: seqlen <- length(givenSeq)
        line 6: givenSeq_dinuc <- unlist(lapply(seq_len(seqlen - 1), function(x) {
        line 7: paste0(givenSeq[x], givenSeq[x + 1])
        line 8: }))
        line 9: use_colnames <- .get_feat_names(alph = dna_alphabet, k = 2,
        line 10: seqlen = seqlen)
        line 11: if (seqlen > 0) {
        line 12: one_hot_encoded_dinuc_profile <- matrix(rep(0, length(dna_alphabet_dinuc) *
        line 13: seqlen), nrow = 1, byrow = TRUE)
        line 14: for (i in seq_along(dna_alphabet_dinuc)) {
        line 15: one_hot_encoded_dinuc_profile[, (i - 1) * seqlen +
        line 16: which(givenSeq_dinuc == dna_alphabet_dinuc[i])] <- 1
        line 17: }
        line 18: colnames(one_hot_encoded_dinuc_profile) <- use_colnames
        line 19: return(one_hot_encoded_dinuc_profile)
        line 20: }
        line 21: else {
        line 22: stop("Empty or NULL found")
        line 23: }
    in .one_hot_encode_trinuc
        line 1:{
        line 2: dna_alphabet <- c("A", "C", "G", "T")
        line 3: dna_alphabet_trinuc <- do.call(paste0, expand.grid(dna_alphabet,
        line 4: dna_alphabet, dna_alphabet))
        line 5: seqlen <- length(givenSeq)
        line 6: givenSeq_trinuc <- unlist(lapply(seq_len(seqlen - 2), function(x) {
        line 7: paste0(givenSeq[x], givenSeq[x + 1], givenSeq[x + 2])
        line 8: }))
        line 9: use_colnames <- .get_feat_names(alph = dna_alphabet, k = 3,
        line 10: seqlen = seqlen)
        line 11: if (seqlen > 0) {
        line 12: one_hot_encoded_trinuc_profile <- matrix(rep(0, length(dna_alphabet_trinuc) *
        line 13: seqlen), nrow = 1, byrow = TRUE)
        line 14: for (i in seq_along(dna_alphabet_trinuc)) {
        line 15: one_hot_encoded_trinuc_profile[, (i - 1) * seqlen +
        line 16: which(givenSeq_trinuc == dna_alphabet_trinuc[i])] <- 1
        line 17: }
        line 18: colnames(one_hot_encoded_trinuc_profile) <- use_colnames
        line 19: return(one_hot_encoded_trinuc_profile)
        line 20: }
        line 21: else {
        line 22: stop("Empty or NULL found")
        line 23: }
repetition in .one_hot_encode_sinuc and .one_hot_encode_trinuc
    in .one_hot_encode_sinuc
        line 4: use_colnames <- .get_feat_names(alph = dna_alphabet, k = 1,
        line 5: seqlen = seqlen)
        line 6: if (seqlen > 0) {
        line 7: one_hot_encoded <- matrix(rep(0, length(dna_alphabet) *
        line 15: }
        line 16: else {
        line 17: stop("Empty or NULL found")
        line 18: }
    in .one_hot_encode_trinuc
        line 9: use_colnames <- .get_feat_names(alph = dna_alphabet, k = 3,
        line 10: seqlen = seqlen)
        line 11: if (seqlen > 0) {
        line 12: one_hot_encoded_trinuc_profile <- matrix(rep(0, length(dna_alphabet_trinuc) *
        line 20: }
        line 21: else {
        line 22: stop("Empty or NULL found")
        line 23: }
repetition in .unfurl_nodeList and get_features_matrixand get_samples_matrix
    in .unfurl_nodeList
        line 2:{
        line 3: returnVal <- .assert_seqArchR_list_properties(nodeList)
        line 4: if (returnVal != "FOO")
        line 5: stop(returnVal)
    in get_features_matrix
        line 1:{
        line 2: returnVal <- .assert_seqArchR_list_properties(nmfResultObj)
        line 3: if (returnVal != "FOO")
        line 4: stop(returnVal)
    in get_samples_matrix
        line 1:{
        line 2: returnVal <- .assert_seqArchR_list_properties(nmfResultObj)
        line 3: if (returnVal != "FOO")
        line 4: stop(returnVal)
repetition in make_dinuc_PWMs and make_sinuc_PWMs
    in make_dinuc_PWMs
        line 5: if (add_pseudo_counts) {
        line 6: vec <- vec + 10^-5
        line 7: }
        line 8: this_mat <- t(matrix(vec, ncol = length(dinuc), byrow = FALSE))
        line 9: rownames(this_mat) <- dinuc
        line 10: if (scale) {
    in make_sinuc_PWMs
        line 4: if (add_pseudo_counts) {
        line 5: vec <- vec + 10^-5
        line 6: }
        line 7: this_mat <- t(matrix(vec, ncol = length(sinuc), byrow = FALSE))
        line 8: rownames(this_mat) <- sinuc
        line 9: if (scale) {
repetition in plot_arch_for_clusters and plot_ggseqlogo_of_seqs and viz_seqs_acgt_mat
    in plot_arch_for_clusters
        line 14: if (is.null(pos_lab)) {
        line 15: pos_lab <- seq_len(Biostrings::width(seqs[1]))
        line 16: }
    in plot_ggseqlogo_of_seqs
        line 4: if (is.null(pos_lab)) {
        line 5: pos_lab <- seq_len(Biostrings::width(seqs[1]))
        line 6: }
    in viz_seqs_acgt_mat
        line 5:{
        line 6: if (is.null(pos_lab)) {
        line 7: pos_lab <- seq_len(Biostrings::width(seqs[1]))
        line 8: }
repetition in plot_ggheatmap and plot_ggseqlogo
    in plot_ggheatmap
        line 2:{
        line 3: if (is.null(pos_lab))
        line 4: pos_lab <- set_default_pos_lab2(pwm_mat)
        line 5: check_vars(pwm_mat, pos_lab)
        line 20: p1 <- fix_coord(p1, nPos = length(pos_lab), method = "heatmap",
        line 21: fixed_coord = fixed_coord)
        line 22: if (!is.null(pdf_name)) {
        line 23: if (file.exists(pdf_name)) {
        line 24: warning("File exists, will overwrite", immediate. = TRUE)
        line 25: }
        line 26: ggplot2::ggsave(filename = pdf_name, plot = p1, device = "pdf",
        line 27: width = 20, height = 2.5)
        line 28: }
        line 29: return(p1)
    in plot_ggseqlogo
        line 3:{
        line 4: if (is.null(pos_lab))
        line 5: pos_lab <- set_default_pos_lab2(pwm_mat)
        line 6: check_vars(pwm_mat, pos_lab)
        line 15: p1 <- fix_coord(p1, nPos = length(pos_lab), method = method,
        line 16: fixed_coord = fixed_coord)
        line 17: if (!is.null(pdf_name)) {
        line 18: if (file.exists(pdf_name)) {
        line 19: warning("File exists, will overwrite", immediate. = TRUE)
        line 20: }
        line 21: ggsave(filename = pdf_name, plot = p1, device = "pdf",
        line 22: width = 25, height = 2.5)
        line 23: }
        line 24: return(p1)
repetition in viz_bas_vec_heatmap and viz_bas_vec_heatmap_seqlogo and viz_bas_vec_seqlogo
    in viz_bas_vec_heatmap
        line 3:{
        line 4: check_vars2(feat_mat)
        line 5: if (is.null(pos_lab)) {
        line 6: pos_lab <- set_default_pos_lab(feat_mat, sinuc_or_dinuc)
        line 7: }
        line 8: pl_list <- apply(feat_mat, MARGIN = 2, function(x) {
        line 9: if (sinuc_or_dinuc == "dinuc") {
        line 10: pwm <- make_dinuc_PWMs(x, add_pseudo_counts = FALSE)
        line 11: }
        line 12: else if (sinuc_or_dinuc == "sinuc") {
        line 13: pwm <- make_sinuc_PWMs(x, add_pseudo_counts = FALSE)
        line 14: }
        line 15: p1 <- plot_ggheatmap(pwm_mat = pwm, pos_lab = pos_lab,
    in viz_bas_vec_heatmap_seqlogo
        line 4: check_cowplot()
        line 5: check_vars2(feat_mat)
        line 6: if (is.null(pos_lab)) {
        line 7: pos_lab <- set_default_pos_lab(feat_mat, sinuc_or_dinuc)
        line 8: }
        line 9: pl_list <- apply(feat_mat, MARGIN = 2, function(x) {
        line 10: if (sinuc_or_dinuc == "dinuc") {
        line 11: pwm <- make_dinuc_PWMs(x, add_pseudo_counts = FALSE)
        line 12: }
        line 13: else if (sinuc_or_dinuc == "sinuc") {
        line 14: pwm <- make_sinuc_PWMs(x, add_pseudo_counts = FALSE)
        line 15: }
        line 16: p1 <- plot_ggheatmap(pwm_mat = pwm, pos_lab = pos_lab)
    in viz_bas_vec_seqlogo
        line 1: method = "bits", pos_lab = NULL, add_pseudo_counts = FALSE,
        line 2: pdf_name = NULL, sinuc_or_dinuc = "sinuc", fixed_coord = FALSE)
        line 3:{
        line 4: check_vars2(feat_mat)
        line 5: if (is.null(pos_lab)) {
        line 6: pos_lab <- set_default_pos_lab(feat_mat, sinuc_or_dinuc)
        line 7: }
        line 8: pl_list <- apply(feat_mat, MARGIN = 2, function(x) {
        line 9: if (sinuc_or_dinuc == "dinuc") {
        line 10: pwm <- make_dinuc_PWMs(x, add_pseudo_counts = FALSE)
        line 11: }
        line 12: else if (sinuc_or_dinuc == "sinuc") {
        line 13: pwm <- make_sinuc_PWMs(x, add_pseudo_counts = FALSE)
        line 14: }
        line 15: p1 <- plot_ggseqlogo(pwm_mat = pwm, method = method,
repetition in viz_bas_vec_heatmap_seqlogo and viz_bas_vec_seqlogo
    in viz_bas_vec_heatmap_seqlogo
        line 4: check_cowplot()
        line 5: check_vars2(feat_mat)
        line 6: if (is.null(pos_lab)) {
        line 7: pos_lab <- set_default_pos_lab(feat_mat, sinuc_or_dinuc)
        line 8: }
        line 9: pl_list <- apply(feat_mat, MARGIN = 2, function(x) {
        line 10: if (sinuc_or_dinuc == "dinuc") {
        line 11: pwm <- make_dinuc_PWMs(x, add_pseudo_counts = FALSE)
        line 12: }
        line 13: else if (sinuc_or_dinuc == "sinuc") {
        line 14: pwm <- make_sinuc_PWMs(x, add_pseudo_counts = FALSE)
        line 15: }
        line 16: p1 <- plot_ggheatmap(pwm_mat = pwm, pos_lab = pos_lab)
        line 24: final_p
        line 25: })
        line 26: if (!is.null(pdf_name)) {
        line 27: if (file.exists(pdf_name)) {
        line 28: warning("File exists, will overwrite", immediate. = TRUE)
        line 29: }
        line 30: grDevices::pdf(file = pdf_name, width = 20, height = 4)
        line 31: lapply(pl_list, print)
        line 32: dev.off()
        line 33: return(invisible(NULL))
        line 34: }
        line 35: pl_list
    in viz_bas_vec_seqlogo
        line 3:{
        line 4: check_vars2(feat_mat)
        line 5: if (is.null(pos_lab)) {
        line 6: pos_lab <- set_default_pos_lab(feat_mat, sinuc_or_dinuc)
        line 7: }
        line 8: pl_list <- apply(feat_mat, MARGIN = 2, function(x) {
        line 9: if (sinuc_or_dinuc == "dinuc") {
        line 10: pwm <- make_dinuc_PWMs(x, add_pseudo_counts = FALSE)
        line 11: }
        line 12: else if (sinuc_or_dinuc == "sinuc") {
        line 13: pwm <- make_sinuc_PWMs(x, add_pseudo_counts = FALSE)
        line 14: }
        line 15: p1 <- plot_ggseqlogo(pwm_mat = pwm, method = method,
        line 17: p1
        line 18: })
        line 19: if (!is.null(pdf_name)) {
        line 20: if (file.exists(pdf_name)) {
        line 21: warning("File exists, will overwrite", immediate. = TRUE)
        line 22: }
        line 23: grDevices::pdf(file = pdf_name, width = 20, height = 4)
        line 24: lapply(pl_list, print)
        line 25: dev.off()
        line 26: return(invisible(NULL))
        line 27: }
        line 28: pl_list

Suggestion: export parameters with \dots for called functions such as ggsave

Documentation

Vignette should use BiocStyle package for formatting. -- DONE

rmd file vignettes/seqArchR.Rmd

Please include Bioconductor installation instructions using BiocManager. -- DONE

rmd file vignettes/seqArchR.Rmd

Please remove TODO from vignettes or change it unvisiable. -- DONE

Note: please estimate teh running time for codes at line 213:218 in vignettes/seqArchR.Rmd. -- NOTE: It takes just about 1.5 to 2 minutes to process this chunk. I only set it to eval=FALSE when bioc builds were timing out. BTW, I do have tests to check that this processing works.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: b42fb24563c8406396ae1b3c88087802bf879176

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/seqArchR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

snikumbh commented 2 years ago

Hi @jianhong ,

The push after my last message was for minor changes. This is now ready for you to have another look. See my detailed answers in the previous message.

Thanks, Sarvesh

jianhong commented 2 years ago

After removing the unused code, the package will be marked as acceptable.

R code

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 56860295fbd4e1e4f9a71dbd42d2f409611ace1c

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/seqArchR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

snikumbh commented 2 years ago

All done!

Thanks @jianhong

bioc-issue-bot commented 2 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

lshep commented 2 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/snikumbh.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("seqArchR"). The package 'landing page' will be created at

https://bioconductor.org/packages/seqArchR

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.