Closed snikumbh closed 2 years ago
Hi @snikumbh
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Type: Package
Package: seqArchR
Title: Identify Different Architectures of Sequence Elements
Version: 0.99.0
Authors@R:
person("Sarvesh", "Nikumbh", email = "sarvesh.nikumbh@gmail.com",
role = c("aut", "cre", "cph"),
comment = c(ORCID = "0000-0003-3163-4447",
Twitter = "@sarveshnikumbh"))
Description: \code{seqArchR} enables unsupervised discovery of _de novo_ clusters
with characteristic sequence architectures characterized by
position-specific motifs or composition of stretches of nucleotides,
e.g., CG-richness. \code{seqArchR} does _not_ require any specifications
w.r.t. the number of clusters, the length of any individual motifs, or
the distance between motifs if and when they occur in pairs/groups; it
directly detects them from the data. \code{seqArchR} uses non-negative
matrix factorization (NMF) as its backbone, and employs a chunking-based
iterative procedure that enables processing of large sequence collections
efficiently. Wrapper functions are provided for visualizing cluster
architectures as sequence logos.
License: GPL-3 | file LICENSE
URL: https://snikumbh.github.io/seqArchR/,
https://github.com/snikumbh/seqArchR
BugReports: https://github.com/snikumbh/seqArchR/issues
SystemRequirements: Python (>= 3.5), scikit-learn (>= 0.21.2)
Depends:
R (>= 4.1.0)
Imports:
utils,
graphics,
cvTools (>= 0.3.2),
MASS,
Matrix,
methods,
stats,
cluster,
matrixStats,
fpc,
cli,
prettyunits,
reshape2 (>= 1.4.3),
reticulate (>= 1.22),
parallel,
Biostrings,
grDevices,
ggplot2 (>= 3.1.1),
ggseqlogo (>= 0.1)
Suggests:
TFBSTools,
cowplot,
hopach (>= 2.42.0),
knitr (>= 1.22),
rmarkdown (>= 1.12),
testthat (>= 3.0.2),
covr,
vdiffr (>= 0.3.0)
VignetteBuilder:
knitr
biocViews:
MotifDiscovery,
GeneRegulation,
MathematicalBiology,
SystemsBiology,
Transcriptomics,
Genetics,
Clustering,
DimensionReduction,
FeatureExtraction,
DNASeq
Encoding: UTF-8
LazyData: false
RoxygenNote: 7.1.2
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR, TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/seqArchR
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 05a2c04a2f70d216173f278be8c2235168b8a34e
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/seqArchR
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 6d3015d2c58cb8d8427a076310ec37b8d0e3a3e5
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "UNSUPPORTED, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/seqArchR
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: af047d4a6762b26002d5d9a5be6c3d55cbdfaca2
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/seqArchR
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: ecaa00c9968d9d4e05146cc8418bfc9eaaa1de5b
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/seqArchR
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: a64b98d05e5e282087b0407b2248229f1f5d57e5
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/seqArchR
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi @jianhong
I think this is good to go. Look forward to your review. Thanks in advance.
The package passed check and build. It is in pretty good shape. However there are several things need to be fixed.
importFrom
instead of import all with import
.
@
or slot()
- accessors implemented and used.
::
is not suggested in source code unless you can make sure all the packages are imported. Please note that you need to manully double check the import items when you make any change in the DESCRIPTION file during development.for
loops present.
file.path
to replace paste
. I am not understand why you need paste /
here.
.assert_seqArchR_featuresMatrix
and .assert_seqArchR_samplesMatrix
.assert_seqArchR_kFolds_in_tandem
and .assert_seqArchR_kFolds_independent
and .assert_seqArchR_nRuns
.compare_iqr
and .compare_range
.get_q2_using_py
and .get_q2_using_py_serial
.one_hot_encode_dinuc
and .one_hot_encode_sinuc
.one_hot_encode_dinuc
and .one_hot_encode_trinuc
.one_hot_encode_sinuc
and .one_hot_encode_trinuc
.unfurl_nodeList
and get_features_matrix
and get_samples_matrix
make_dinuc_PWMs
and make_sinuc_PWMs
plot_arch_for_clusters
and plot_ggseqlogo_of_seqs
and viz_seqs_acgt_mat
plot_ggheatmap
and plot_ggseqlogo
viz_bas_vec_heatmap
and viz_bas_vec_heatmap_seqlogo
and viz_bas_vec_seqlogo
viz_bas_vec_heatmap_seqlogo
and viz_bas_vec_seqlogo
\dots
for called functions such as ggsave
BiocStyle
package for formatting.
Hi @jianhong,
Thanks for taking the time to review. I am addressing the points you have raised. I will ping again when I am done. Looking forward to meeting the deadline for the upcoming release.
Received a valid push on git.bioconductor.org; starting a build for commit id: b30ffd9a76151dc1bd43111bf426a2f54c3fab80
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/seqArchR
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi @jianhong
I have addressed all the points you raised. They are answered in detail below. Look forward to hearing from you again. Thanks again for taking the time to review.
Thanks and best, Sarvesh
The DESCRIPTION file -- DONE
R version should be no less than 4.2
The NAMESPACE file -- DONE
Selective imports using importFrom instead of import all with import.
in line 25 import(cli)
in line 26 import(ggplot2)
in line 27 import(ggseqlogo)
unacceptable files -- DONE (removed)
unacceptable files present (see [.gitignore](https://contributions.bioconductor.org/gitignore.html) for a listing).
./index.md
R code -- DONE
no direct slot access with @ or slot() - accessors implemented and used.
In file R/seqArchR_assertions.R:
at line 568 found ' tmp <- seqArchRresultObj$rawSeqs[1]@ranges@width'
In file R/seqArchR_auxiliary_functionsII.R:
at line 338 found ' stopifnot(hopachDistMat@Size == nrow(factorsMat))'
:: is not suggested in source code unless you can make sure all the packages are imported. Please note that you need to manully double check the import items when you make any change in the DESCRIPTION file during development.
NOTE: I have retained the usage of :: because it adds to the readability and saves time when debugging or writing additional code (personal experience). I have manually checked all imports items.
Vectorize: no unnecessary for loops present. -- DONE. Where ideal, these have been addressed. See detail responses for individual cases below.
Use file.path to replace paste. I am not understand why you need paste / here. -- DONE. This was indeed unnecessary.
In file R/seqArchR_auxiliary_functionsII.R:
at line 37 found ' retVal <- dir.create(paste0(o_dir, "/"), showWarnings = TRUE)'
Remove unused code. -- DONE. All removed.
In file R/NMF_model_selection_functions.R:
at line 198 found ' # parallelDo = FALSE,'
at line 208 found ' #aBase=param_ranges$alphaBase, aPow=param_ranges$alphaPow,'
at line 222 found ' # if(parallelDo){'
at line 223 found ' # q2_vals <- unlist(parallel::clusterApplyLB(cl = NULL,'
at line 224 found ' # seq_len(nrow(grid_search_params)), function(i) {'
at line 225 found ' # .get_q2_using_py( grid_search_params[i,] )'
at line 226 found ' # }))'
at line 227 found ' # }else{'
at line 228 found ' # q2_vals <- unlist('
at line 229 found ' # lapply(seq_len(nrow(grid_search_params)),'
at line 230 found ' # function(i) {'
at line 231 found ' # .get_q2_using_py_serial( grid_search_params[i,],'
at line 232 found ' # X = X, cvfolds = cvfolds)'
at line 233 found ' # }))'
at line 234 found ' # }'
at line 287 found ' # parallelDo = FALSE,'
at line 288 found ' # nCores = NA,'
at line 294 found ' # monolinear = FALSE,'
at line 322 found ' # if (parallelDo) {'
at line 326 found ' # if(parallelDo){'
at line 327 found ' # cl <- .setup_par_cluster(vlist='
at line 328 found ' # c(".get_q2_using_py", ".compute_q2", "X", "cvfolds"))'
at line 329 found ' # }'
at line 333 found ' # .msg_pstr("Coarse-fine grained binary search", flg=vrbs)'
at line 335 found ' #go_fine <- FALSE'
at line 338 found ' #eureka <- FALSE'
at line 341 found ' #coarse_step <- 10'
at line 359 found ' # parallelDo = parallelDo,'
at line 376 found ' ## fgIL to (mi-5)'
at line 380 found ' #go_fine <- TRUE'
at line 388 found ' # idx_best <- as.numeric(which.max('
at line 389 found ' # unlist(coarse_prev_df["q2_vals"])))'
at line 390 found ' # threshold <- coarse_prev_df[idx_best, "q2"] -'
at line 417 found ' ## go fine over interval (hi[kCGIdx]+1 , lo[kCGIdx])'
at line 433 found ' ## best_K is == hi, go to next coarse-grained iteration'
at line 449 found ' # parallelDo = parallelDo,'
at line 458 found ' # minKInDF <- min(as.numeric(unlist(combined_df["k_vals"])))'
at line 481 found ' # parallelDo = parallelDo,'
at line 485 found ' # temp_best_K <- searchReturnFine$best_K'
at line 488 found ' # combined_df <- rbind(combined_df, fine_prev_df)'
at line 493 found ' # minKInDF <- min(as.numeric('
at line 494 found ' # unlist(combined_df["k_vals"])))'
at line 495 found ' # message("MIN_K_IN_DF: ", minKInDF)'
at line 504 found ' # }'
at line 599 found ' # check_par_conditions(nCores=nCores)'
at line 773 found ' title = "Reconstruction accuracy, Q\U00B2 = f(#Factors)",'
at line 774 found ' x = "#Factors (K)",'
In file R/NMF_model_selection_functionsII.R:
at line 16 found ' # tol = 10^-3,'
In file R/plot_arch_for_clusters.R:
at line 214 found ' # nPos <- length(pos_lab)'
at line 215 found ' # xtick_cal <- seq(0, nPos, by = xt_freq)'
at line 216 found ' # xtick_cal[1] <- 1'
at line 217 found ' # xtick_cal[length(xtick_cal)] <- nPos'
In file R/plot_ggheatmap.R:
at line 65 found ' mid = "white", high = "#012345") +'
at line 156 found ' # ggplot2::theme(axis.text.x = element_text(size = rel(0.9),'
at line 157 found ' # angle = 90, hjust = 1),'
at line 158 found ' # axis.text.y = element_text(size = rel(0.9)))'
In file R/prepare_data_from_FASTA.R:
at line 29 found ' # colnames(one_hot_encoded) <- paste(rep(dna_alphabet,'
at line 30 found ' # each = seqlen), seq_len(seqlen), sep=".")'
at line 234 found ' # length_vals <- unlist(lapply(seqs_split_as_list, length))'
In file R/seqArchR_assertions.R:
at line 54 found ' # matElements <-'
at line 55 found ' # if () {'
at line 56 found ' # stop("")'
at line 57 found ' # }'
at line 95 found ' # if ((nrow(featuresMatrix) %% 4) != 0) {'
at line 96 found ' # stop("#Rows in featuresMatrix not a multiple of 4")'
at line 97 found ' # }'
at line 456 found ' stop("In NMF result, #clusters != #factors")'
In file R/seqArchR_auxiliary_functionsI.R:
at line 134 found ' # clust_list <- get_seqs_clust_list(res$seqsClustLabels[[iter]])'
at line 140 found ' # clust_list <- get_seqs_clust_list(res$seqsClustLabels[[iter]])'
at line 186 found ' # new_mem[which(old_mem == i)] <- 1'
at line 279 found ' # out_clust_range <- NULL'
at line 301 found ' # left_out <- setdiff(qual_cl_idx, union(out_clust_range, out_clust_iqr))'
at line 302 found ' # if(length(left_out) > 0){'
at line 303 found ' # out_clust_size <- .compare_size(clustwise_matlist, qual_cl_idx)'
at line 304 found ' # }'
at line 307 found ' # return(NULL)'
at line 1084 found ' # clust_list <- .detect_just_for_sake_clust(cheight_idx, clust_list,'
at line 1085 found ' # vrbs=verbose)'
at line 1216 found ' # useMinClusters <- keepMinClusters(set_ocollation, temp_res,'
at line 1217 found ' # totOuterChunksColl ='
at line 1218 found ' # totOuterChunksColl, dbg = dbg,'
at line 1219 found ' # nClustEachIC = nClustEachIC,'
at line 1220 found ' # test_itr = test_itr -1,'
at line 1221 found ' # stage="Final")'
In file R/seqArchR_auxiliary_functionsII.R:
at line 20 found ' # .msg_pstr("-- Directory exists: -- ", o_dir,'
at line 21 found ' # "-- Changing name to: -- ", flg=vrbs)'
at line 230 found ' # tol = tol,'
at line 364 found ' # factorsMatList_as2D <- lapply(seq_len(ncol(factorsMat)),'
at line 365 found ' # function(x){matrix(factorsMat[,x],'
at line 366 found ' # nrow = nrow(factorsMat)/nPositions,'
at line 367 found ' # byrow = TRUE,'
at line 368 found ' # dimnames = list(dim_names))'
at line 369 found ' # })'
at line 371 found ' # factorsMatList_asPFMs <- lapply(seq_len(length(factorsMatList_as2D)),'
at line 372 found ' # function(x){'
at line 373 found ' # sinucSparse <- collapse_into_sinuc_matrix('
at line 374 found ' # given_feature_mat = as.matrix(factorsMat[,x]),'
at line 375 found ' # dinuc_mat = factorsMatList_as2D[[x]],'
at line 376 found ' # feature_names = dim_names)'
at line 377 found ' # sinucSparseInt <- matrix(as.integer(round(sinucSparse)),'
at line 378 found ' # nrow = 4, byrow = FALSE,'
at line 379 found ' # dimnames = list(rownames(sinucSparse)))'
at line 380 found ' # })'
at line 405 found ' # relScoresMat[i,j] <- temp["relScore"]'
at line 478 found ' # parallelDo = config$parallelize, nCores = config$nCoresUse,'
at line 491 found ' # parallelDo = config$parallelize, nCores = config$nCoresUse,'
at line 513 found ' # .msg_pstr("Fetching ", best_k, " clusters", flg=(vrbs || dbg))'
at line 521 found ' # parallelDo = config$parallelize,'
at line 522 found ' # nCores = config$nCoresUse,'
at line 534 found ' ##A <- this_mat[, new_ord[[nR]]]'
at line 546 found ' # .msg_pstr("Best Q2 giving run found: ", bestQ2, flg=dbg)'
at line 547 found ' # cli::cli_alert_info("Fetched {best_k} clusters")'
at line 557 found ' # .msg_pstr("Fetching ", best_k," cluster(s)", flg=dbg)'
at line 842 found ' # if(parallelize){'
at line 843 found ' # cl <- parallel::makeCluster(crs, type = "FORK")'
at line 844 found ' # parallel::setDefaultCluster(cl)'
at line 845 found ' # cli::cli_alert_info("Parallelization: {crs} cores")'
at line 846 found ' # }else{'
at line 847 found ' # cl <- NA'
at line 848 found ' # cli::cli_alert_info("Parallelization: No")'
at line 849 found ' # }'
at line 873 found ' # globFactors <- vector("list", length(innerChunksColl))'
at line 874 found ' # globClustAssignments <- vector("list", length(innerChunksColl))'
at line 875 found ' # nClustEachIC <- rep(0, length(innerChunksColl))'
In file R/seqArchR_main.R:
at line 433 found ' # if(parallelize) parallel::stopCluster(setup_ans$cl)'
In file R/viz_matrix_of_acgt_image.R:
at line 131 found ' # xtick_cal <- seq(0, nPos, by = xt_freq)'
at line 132 found ' # xtick_cal[1] <- 1'
In file R/zzz.R:
at line 4 found ' # reticulate::source_python(system.file('
at line 6 found ' # package = "seqArchR",'
at line 7 found ' # mustWork = TRUE'
at line 8 found ' # ))'
at line 9 found ' # reticulate::configure_environment("seqArchR")'
at line 10 found ' # sklearn <<- reticulate::import("sklearn", delay_load = TRUE)'
Functional programming: code repetition. -- DONE. All repetitions cleared. This has made the plotting functions, where the majority of these existed, much more succint.
repetition in .assert_seqArchR_featuresMatrix and .assert_seqArchR_samplesMatrix
in .assert_seqArchR_featuresMatrix
line 1: function (featuresMatrix)
line 2: {
line 3: check_ncols <- 0
line 4: if (is.null(featuresMatrix)) {
line 5: stop("NULL value found, instead of a matrix")
line 6: }
line 7: if (!is.matrix(featuresMatrix)) {
line 8: stop("Expected a matrix, found otherwise")
line 9: }
line 10: else {
line 13: }
line 14: if (ncol(featuresMatrix) < 1) {
line 15: stop("0 columns (sequences) in samplesMatrix")
line 16: }
line 17: if (ncol(featuresMatrix) == check_ncols) {
line 18: stop("Check matrix, 'ncols' is: ", check_ncols)
in .assert_seqArchR_samplesMatrix
line 1: function (samplesMatrix)
line 2: {
line 3: check_nrows <- 0
line 4: if (is.null(samplesMatrix)) {
line 5: stop("NULL value found, instead of a matrix")
line 6: }
line 7: if (!is.matrix(samplesMatrix)) {
line 8: stop("Expected a matrix, found otherwise")
line 9: }
line 10: else {
line 11: if (ncol(samplesMatrix) < 1) {
line 12: stop("0 columns (sequences) in samplesMatrix")
line 13: }
line 14: if (nrow(samplesMatrix) == check_nrows) {
line 15: stop("Check matrix, nrows == ", check_nrows)
repetition in .assert_seqArchR_kFolds_in_tandem and .assert_seqArchR_kFolds_independent and .assert_seqArchR_nRuns
in .assert_seqArchR_kFolds_in_tandem
line 2: {
line 3: if (is.null(kFolds_var)) {
line 4: stop("'kFolds' is NULL")
line 5: }
line 6: if (!is.numeric(kFolds_var)) {
line 7: stop("'kFolds' should be numeric and > 0")
line 8: }
line 9: else {
line 10: if (kFolds_var < 1) {
line 11: stop("'kFolds' should be > 0")
line 12: }
in .assert_seqArchR_kFolds_independent
line 2: {
line 3: if (is.null(kFolds_var)) {
line 4: stop("'kFolds' is NULL")
line 5: }
line 6: if (!is.numeric(kFolds_var)) {
line 7: stop("'kFolds' should be numeric and > 0")
line 8: }
line 9: else {
line 10: if (kFolds_var < 1) {
line 11: stop("'kFolds' should be > 0")
line 12: }
line 13: }
in .assert_seqArchR_nRuns
line 3: stop("'n_runs' is NULL")
line 4: }
line 5: if (!is.numeric(nIter_var)) {
line 6: stop("'n_runs' should be numeric and > 0")
line 7: }
line 8: else {
repetition in .compare_iqr and .compare_range
in .compare_iqr
line 1: qual_cl_idx, zscore_thresh = 5)
line 2:{
line 3: ncl <- ncol(clustwise_matlist[[1]])
line 4: all_iqr <- lapply(clustwise_matlist, function(x) {
line 9: iqr_zscore <- (all_iqr_vec - stats::median(all_iqr_vec))/all_iqr_mad
line 10: out_idx <- which(iqr_zscore > zscore_thresh)
line 11: if (length(out_idx) > 0) {
line 12: clust_id <- ceiling(out_idx/ncl)
line 13: return(intersect(clust_id, qual_cl_idx))
line 14: }
line 15: else {
line 16: }
line 17: return(NULL)
in .compare_range
line 1: qual_cl_idx, zscore_thresh = 5)
line 2:{
line 3: ncl <- ncol(clustwise_matlist[[1]])
line 4: all_range <- lapply(clustwise_matlist, function(x) {
line 10: range_zscore <- (all_range_vec - stats::median(all_range_vec))/all_range_mad
line 11: out_idx <- which(range_zscore > zscore_thresh)
line 12: if (length(out_idx) > 0) {
line 13: clust_id <- ceiling(out_idx/ncl)
line 14: return(intersect(clust_id, qual_cl_idx))
line 15: }
line 16: else {
line 17: }
line 18: return(NULL)
repetition in .get_q2_using_py and .get_q2_using_py_serial
in .get_q2_using_py
line 1: cvfolds, X)
line 2:{
line 3: this_k <- as.numeric(x["k_vals"])
line 4: this_alpha <- as.numeric(x["alpha"])
line 5: this_seed <- as.numeric(x["seed_val"])
line 6: test_fold <- as.numeric(x["fold"])
line 7: train_rows <- cvfolds$cvf_rows$subsets[cvfolds$cvf_rows$which !=
line 8: test_fold]
line 9: train_cols <- cvfolds$cvf_cols$subsets[cvfolds$cvf_cols$which !=
line 10: test_fold]
line 11: test_rows <- cvfolds$cvf_rows$subsets[cvfolds$cvf_rows$which ==
line 12: test_fold]
line 13: test_cols <- cvfolds$cvf_cols$subsets[cvfolds$cvf_cols$which ==
line 14: test_fold]
line 15: submatrixD <- X[train_rows, train_cols]
line 16: submatrixA <- X[test_rows, test_cols]
line 17: submatrixB <- X[test_rows, train_cols]
line 18: submatrixC <- X[train_rows, test_cols]
line 19: nmf_submatrixD_result <- .perform_single_NMF_run(Xmat = submatrixD,
line 20: kVal = as.integer(this_k), alphaVal = this_alpha, seedVal = this_seed)
line 21: D_W <- nmf_submatrixD_result$featuresMatrix
line 22: D_H <- nmf_submatrixD_result$samplesMatrix
line 23: reconstructed_submatrixA <- as.matrix(submatrixB) %% MASS::ginv(D_H) %%
line 24: MASS::ginv(D_W) %*% as.matrix(submatrixC)
line 25: q2 <- .compute_q2(as.matrix(submatrixA), reconstructed_submatrixA)
line 26: return(q2)
in .get_q2_using_py_serial
line 1: function (x, X, cvfolds)
line 2: {
line 3: this_k <- as.numeric(x["k_vals"])
line 4: this_alpha <- as.numeric(x["alpha"])
line 5: this_seed <- as.numeric(x["seed_val"])
line 6: test_fold <- as.numeric(x["fold"])
line 7: train_rows <- cvfolds$cvf_rows$subsets[cvfolds$cvf_rows$which !=
line 8: test_fold]
line 9: train_cols <- cvfolds$cvf_cols$subsets[cvfolds$cvf_cols$which !=
line 10: test_fold]
line 11: test_rows <- cvfolds$cvf_rows$subsets[cvfolds$cvf_rows$which ==
line 12: test_fold]
line 13: test_cols <- cvfolds$cvf_cols$subsets[cvfolds$cvf_cols$which ==
line 14: test_fold]
line 15: submatrixD <- X[train_rows, train_cols]
line 16: submatrixA <- X[test_rows, test_cols]
line 17: submatrixB <- X[test_rows, train_cols]
line 18: submatrixC <- X[train_rows, test_cols]
line 19: nmf_submatrixD_result <- .perform_single_NMF_run(Xmat = submatrixD,
line 20: kVal = as.integer(this_k), alphaVal = this_alpha,
line 22: D_W <- nmf_submatrixD_result$featuresMatrix
line 23: D_H <- nmf_submatrixD_result$samplesMatrix
line 24: reconstructed_submatrixA <- as.matrix(submatrixB) %*%
line 25: MASS::ginv(D_H) %% MASS::ginv(D_W) %% as.matrix(submatrixC)
line 26: q2 <- .compute_q2(as.matrix(submatrixA), reconstructed_submatrixA)
line 27: return(q2)
repetition in .one_hot_encode_dinuc and .one_hot_encode_sinuc
in .one_hot_encode_dinuc
line 9: use_colnames <- .get_feat_names(alph = dna_alphabet, k = 2,
line 10: seqlen = seqlen)
line 11: if (seqlen > 0) {
line 12: one_hot_encoded_dinuc_profile <- matrix(rep(0, length(dna_alphabet_dinuc) *
line 13: seqlen), nrow = 1, byrow = TRUE)
line 20: }
line 21: else {
line 22: stop("Empty or NULL found")
line 23: }
in .one_hot_encode_sinuc
line 4: use_colnames <- .get_feat_names(alph = dna_alphabet, k = 1,
line 5: seqlen = seqlen)
line 6: if (seqlen > 0) {
line 7: one_hot_encoded <- matrix(rep(0, length(dna_alphabet) *
line 8: seqlen), nrow = 1, byrow = TRUE)
line 15: }
line 16: else {
line 17: stop("Empty or NULL found")
line 18: }
repetition in .one_hot_encode_dinuc and .one_hot_encode_trinuc
in .one_hot_encode_dinuc
line 1:{
line 2: dna_alphabet <- c("A", "C", "G", "T")
line 3: dna_alphabet_dinuc <- do.call(paste0, expand.grid(dna_alphabet,
line 4: dna_alphabet))
line 5: seqlen <- length(givenSeq)
line 6: givenSeq_dinuc <- unlist(lapply(seq_len(seqlen - 1), function(x) {
line 7: paste0(givenSeq[x], givenSeq[x + 1])
line 8: }))
line 9: use_colnames <- .get_feat_names(alph = dna_alphabet, k = 2,
line 10: seqlen = seqlen)
line 11: if (seqlen > 0) {
line 12: one_hot_encoded_dinuc_profile <- matrix(rep(0, length(dna_alphabet_dinuc) *
line 13: seqlen), nrow = 1, byrow = TRUE)
line 14: for (i in seq_along(dna_alphabet_dinuc)) {
line 15: one_hot_encoded_dinuc_profile[, (i - 1) * seqlen +
line 16: which(givenSeq_dinuc == dna_alphabet_dinuc[i])] <- 1
line 17: }
line 18: colnames(one_hot_encoded_dinuc_profile) <- use_colnames
line 19: return(one_hot_encoded_dinuc_profile)
line 20: }
line 21: else {
line 22: stop("Empty or NULL found")
line 23: }
in .one_hot_encode_trinuc
line 1:{
line 2: dna_alphabet <- c("A", "C", "G", "T")
line 3: dna_alphabet_trinuc <- do.call(paste0, expand.grid(dna_alphabet,
line 4: dna_alphabet, dna_alphabet))
line 5: seqlen <- length(givenSeq)
line 6: givenSeq_trinuc <- unlist(lapply(seq_len(seqlen - 2), function(x) {
line 7: paste0(givenSeq[x], givenSeq[x + 1], givenSeq[x + 2])
line 8: }))
line 9: use_colnames <- .get_feat_names(alph = dna_alphabet, k = 3,
line 10: seqlen = seqlen)
line 11: if (seqlen > 0) {
line 12: one_hot_encoded_trinuc_profile <- matrix(rep(0, length(dna_alphabet_trinuc) *
line 13: seqlen), nrow = 1, byrow = TRUE)
line 14: for (i in seq_along(dna_alphabet_trinuc)) {
line 15: one_hot_encoded_trinuc_profile[, (i - 1) * seqlen +
line 16: which(givenSeq_trinuc == dna_alphabet_trinuc[i])] <- 1
line 17: }
line 18: colnames(one_hot_encoded_trinuc_profile) <- use_colnames
line 19: return(one_hot_encoded_trinuc_profile)
line 20: }
line 21: else {
line 22: stop("Empty or NULL found")
line 23: }
repetition in .one_hot_encode_sinuc and .one_hot_encode_trinuc
in .one_hot_encode_sinuc
line 4: use_colnames <- .get_feat_names(alph = dna_alphabet, k = 1,
line 5: seqlen = seqlen)
line 6: if (seqlen > 0) {
line 7: one_hot_encoded <- matrix(rep(0, length(dna_alphabet) *
line 15: }
line 16: else {
line 17: stop("Empty or NULL found")
line 18: }
in .one_hot_encode_trinuc
line 9: use_colnames <- .get_feat_names(alph = dna_alphabet, k = 3,
line 10: seqlen = seqlen)
line 11: if (seqlen > 0) {
line 12: one_hot_encoded_trinuc_profile <- matrix(rep(0, length(dna_alphabet_trinuc) *
line 20: }
line 21: else {
line 22: stop("Empty or NULL found")
line 23: }
repetition in .unfurl_nodeList and get_features_matrixand get_samples_matrix
in .unfurl_nodeList
line 2:{
line 3: returnVal <- .assert_seqArchR_list_properties(nodeList)
line 4: if (returnVal != "FOO")
line 5: stop(returnVal)
in get_features_matrix
line 1:{
line 2: returnVal <- .assert_seqArchR_list_properties(nmfResultObj)
line 3: if (returnVal != "FOO")
line 4: stop(returnVal)
in get_samples_matrix
line 1:{
line 2: returnVal <- .assert_seqArchR_list_properties(nmfResultObj)
line 3: if (returnVal != "FOO")
line 4: stop(returnVal)
repetition in make_dinuc_PWMs and make_sinuc_PWMs
in make_dinuc_PWMs
line 5: if (add_pseudo_counts) {
line 6: vec <- vec + 10^-5
line 7: }
line 8: this_mat <- t(matrix(vec, ncol = length(dinuc), byrow = FALSE))
line 9: rownames(this_mat) <- dinuc
line 10: if (scale) {
in make_sinuc_PWMs
line 4: if (add_pseudo_counts) {
line 5: vec <- vec + 10^-5
line 6: }
line 7: this_mat <- t(matrix(vec, ncol = length(sinuc), byrow = FALSE))
line 8: rownames(this_mat) <- sinuc
line 9: if (scale) {
repetition in plot_arch_for_clusters and plot_ggseqlogo_of_seqs and viz_seqs_acgt_mat
in plot_arch_for_clusters
line 14: if (is.null(pos_lab)) {
line 15: pos_lab <- seq_len(Biostrings::width(seqs[1]))
line 16: }
in plot_ggseqlogo_of_seqs
line 4: if (is.null(pos_lab)) {
line 5: pos_lab <- seq_len(Biostrings::width(seqs[1]))
line 6: }
in viz_seqs_acgt_mat
line 5:{
line 6: if (is.null(pos_lab)) {
line 7: pos_lab <- seq_len(Biostrings::width(seqs[1]))
line 8: }
repetition in plot_ggheatmap and plot_ggseqlogo
in plot_ggheatmap
line 2:{
line 3: if (is.null(pos_lab))
line 4: pos_lab <- set_default_pos_lab2(pwm_mat)
line 5: check_vars(pwm_mat, pos_lab)
line 20: p1 <- fix_coord(p1, nPos = length(pos_lab), method = "heatmap",
line 21: fixed_coord = fixed_coord)
line 22: if (!is.null(pdf_name)) {
line 23: if (file.exists(pdf_name)) {
line 24: warning("File exists, will overwrite", immediate. = TRUE)
line 25: }
line 26: ggplot2::ggsave(filename = pdf_name, plot = p1, device = "pdf",
line 27: width = 20, height = 2.5)
line 28: }
line 29: return(p1)
in plot_ggseqlogo
line 3:{
line 4: if (is.null(pos_lab))
line 5: pos_lab <- set_default_pos_lab2(pwm_mat)
line 6: check_vars(pwm_mat, pos_lab)
line 15: p1 <- fix_coord(p1, nPos = length(pos_lab), method = method,
line 16: fixed_coord = fixed_coord)
line 17: if (!is.null(pdf_name)) {
line 18: if (file.exists(pdf_name)) {
line 19: warning("File exists, will overwrite", immediate. = TRUE)
line 20: }
line 21: ggsave(filename = pdf_name, plot = p1, device = "pdf",
line 22: width = 25, height = 2.5)
line 23: }
line 24: return(p1)
repetition in viz_bas_vec_heatmap and viz_bas_vec_heatmap_seqlogo and viz_bas_vec_seqlogo
in viz_bas_vec_heatmap
line 3:{
line 4: check_vars2(feat_mat)
line 5: if (is.null(pos_lab)) {
line 6: pos_lab <- set_default_pos_lab(feat_mat, sinuc_or_dinuc)
line 7: }
line 8: pl_list <- apply(feat_mat, MARGIN = 2, function(x) {
line 9: if (sinuc_or_dinuc == "dinuc") {
line 10: pwm <- make_dinuc_PWMs(x, add_pseudo_counts = FALSE)
line 11: }
line 12: else if (sinuc_or_dinuc == "sinuc") {
line 13: pwm <- make_sinuc_PWMs(x, add_pseudo_counts = FALSE)
line 14: }
line 15: p1 <- plot_ggheatmap(pwm_mat = pwm, pos_lab = pos_lab,
in viz_bas_vec_heatmap_seqlogo
line 4: check_cowplot()
line 5: check_vars2(feat_mat)
line 6: if (is.null(pos_lab)) {
line 7: pos_lab <- set_default_pos_lab(feat_mat, sinuc_or_dinuc)
line 8: }
line 9: pl_list <- apply(feat_mat, MARGIN = 2, function(x) {
line 10: if (sinuc_or_dinuc == "dinuc") {
line 11: pwm <- make_dinuc_PWMs(x, add_pseudo_counts = FALSE)
line 12: }
line 13: else if (sinuc_or_dinuc == "sinuc") {
line 14: pwm <- make_sinuc_PWMs(x, add_pseudo_counts = FALSE)
line 15: }
line 16: p1 <- plot_ggheatmap(pwm_mat = pwm, pos_lab = pos_lab)
in viz_bas_vec_seqlogo
line 1: method = "bits", pos_lab = NULL, add_pseudo_counts = FALSE,
line 2: pdf_name = NULL, sinuc_or_dinuc = "sinuc", fixed_coord = FALSE)
line 3:{
line 4: check_vars2(feat_mat)
line 5: if (is.null(pos_lab)) {
line 6: pos_lab <- set_default_pos_lab(feat_mat, sinuc_or_dinuc)
line 7: }
line 8: pl_list <- apply(feat_mat, MARGIN = 2, function(x) {
line 9: if (sinuc_or_dinuc == "dinuc") {
line 10: pwm <- make_dinuc_PWMs(x, add_pseudo_counts = FALSE)
line 11: }
line 12: else if (sinuc_or_dinuc == "sinuc") {
line 13: pwm <- make_sinuc_PWMs(x, add_pseudo_counts = FALSE)
line 14: }
line 15: p1 <- plot_ggseqlogo(pwm_mat = pwm, method = method,
repetition in viz_bas_vec_heatmap_seqlogo and viz_bas_vec_seqlogo
in viz_bas_vec_heatmap_seqlogo
line 4: check_cowplot()
line 5: check_vars2(feat_mat)
line 6: if (is.null(pos_lab)) {
line 7: pos_lab <- set_default_pos_lab(feat_mat, sinuc_or_dinuc)
line 8: }
line 9: pl_list <- apply(feat_mat, MARGIN = 2, function(x) {
line 10: if (sinuc_or_dinuc == "dinuc") {
line 11: pwm <- make_dinuc_PWMs(x, add_pseudo_counts = FALSE)
line 12: }
line 13: else if (sinuc_or_dinuc == "sinuc") {
line 14: pwm <- make_sinuc_PWMs(x, add_pseudo_counts = FALSE)
line 15: }
line 16: p1 <- plot_ggheatmap(pwm_mat = pwm, pos_lab = pos_lab)
line 24: final_p
line 25: })
line 26: if (!is.null(pdf_name)) {
line 27: if (file.exists(pdf_name)) {
line 28: warning("File exists, will overwrite", immediate. = TRUE)
line 29: }
line 30: grDevices::pdf(file = pdf_name, width = 20, height = 4)
line 31: lapply(pl_list, print)
line 32: dev.off()
line 33: return(invisible(NULL))
line 34: }
line 35: pl_list
in viz_bas_vec_seqlogo
line 3:{
line 4: check_vars2(feat_mat)
line 5: if (is.null(pos_lab)) {
line 6: pos_lab <- set_default_pos_lab(feat_mat, sinuc_or_dinuc)
line 7: }
line 8: pl_list <- apply(feat_mat, MARGIN = 2, function(x) {
line 9: if (sinuc_or_dinuc == "dinuc") {
line 10: pwm <- make_dinuc_PWMs(x, add_pseudo_counts = FALSE)
line 11: }
line 12: else if (sinuc_or_dinuc == "sinuc") {
line 13: pwm <- make_sinuc_PWMs(x, add_pseudo_counts = FALSE)
line 14: }
line 15: p1 <- plot_ggseqlogo(pwm_mat = pwm, method = method,
line 17: p1
line 18: })
line 19: if (!is.null(pdf_name)) {
line 20: if (file.exists(pdf_name)) {
line 21: warning("File exists, will overwrite", immediate. = TRUE)
line 22: }
line 23: grDevices::pdf(file = pdf_name, width = 20, height = 4)
line 24: lapply(pl_list, print)
line 25: dev.off()
line 26: return(invisible(NULL))
line 27: }
line 28: pl_list
Suggestion: export parameters with \dots for called functions such as ggsave
Documentation
Vignette should use BiocStyle package for formatting. -- DONE
rmd file vignettes/seqArchR.Rmd
Please include Bioconductor installation instructions using BiocManager. -- DONE
rmd file vignettes/seqArchR.Rmd
Please remove TODO from vignettes or change it unvisiable. -- DONE
Note: please estimate teh running time for codes at line 213:218 in vignettes/seqArchR.Rmd. -- NOTE: It takes just about 1.5 to 2 minutes to process this chunk. I only set it to eval=FALSE
when bioc builds were timing out. BTW, I do have tests to check that this processing works.
Received a valid push on git.bioconductor.org; starting a build for commit id: b42fb24563c8406396ae1b3c88087802bf879176
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/seqArchR
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi @jianhong ,
The push after my last message was for minor changes. This is now ready for you to have another look. See my detailed answers in the previous message.
Thanks, Sarvesh
After removing the unused code, the package will be marked as acceptable.
Received a valid push on git.bioconductor.org; starting a build for commit id: 56860295fbd4e1e4f9a71dbd42d2f409611ace1c
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/seqArchR
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
All done!
Thanks @jianhong
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/snikumbh.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("seqArchR")
. The package 'landing page' will be created at
https://bioconductor.org/packages/seqArchR
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.