Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

GRaNIE #2556

Closed chrarnold closed 2 years ago

chrarnold commented 2 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 years ago

Hi @chrarnold

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: GRaNIE
Title: GRaNIE: Reconstruction cell type specific gene regulatory networks including enhancers using chromatin accessibility and RNA-seq data
Version: 0.99.0
Encoding: UTF-8
Authors@R: c(person("Christian", "Arnold", email =
        "chrarnold@web.de", role = c("cre","aut")),
        person("Judith", "Zaugg", email =
        "judith.zaugg@embl.de", role = c("aut")),
        person("Rim", "Moussa", email =
        "rim.moussa01@gmail.com", role = "aut"),
        person("Armando", "Reyes-Palomares", email =
        "armandorp@gmail.com", role = "ctb"),
        person("Giovanni", "Palla", email =
        "giov.pll@gmail.com", role = "ctb"),
        person("Maksim", "Kholmatov", email =
        "maksim.kholmatov@embl.de", role = "ctb"))
Description: Genetic variants associated with diseases often affect non-coding regions, thus likely having a regulatory role. To understand the effects of genetic variants in these regulatory regions, identifying genes that are modulated by specific regulatory elements (REs) is crucial. The effect of gene regulatory elements, such as enhancers, is often cell-type specific, likely because the combinations of transcription factors (TFs) that are regulating a given enhancer have celltype specific activity. This TF activity can be quantified with existing tools such as diffTF and captures differences in binding of a TF in open chromatin regions. Collectively, this forms a gene regulatory network (GRN) with cell-type and data-specific TF-RE and RE-gene links. Here, we reconstruct such a GRN using bulk RNAseq and open chromatin (e.g., using ATACseq or ChIPseq for open chromatin marks) and optionally TF activity data. Our network contains different types of links, connecting TFs to regulatory elements, the latter of which is connected to genes in the vicinity or within the same chromatin domain (TAD). We use a statistical framework to assign empirical FDRs and weights to all links using a permutation-based approach.
Imports:
    futile.logger,
    checkmate,
    patchwork,
    reshape2,
    data.table,
    matrixStats,
    Matrix,
    GenomicRanges,
    RColorBrewer,
    colorspace,
    ComplexHeatmap,
    DESeq2,
    csaw,
    circlize,
    robust,
    progress,
    utils,
    methods,
    stringr,
    scales,
    BiocManager,
    BiocParallel,
    igraph,
    S4Vectors,
    ggplot2,
    rlang,
    Biostrings,
    GenomeInfoDb,
    IRanges,
    SummarizedExperiment,
    forcats,
    gridExtra,
    limma,
    purrr,
    tidyselect,
    readr,
    grid,
    tidyr,
    dplyr,
    stats,
    grDevices,
    graphics,
    magrittr,
    tibble,
    viridis,
    BiocFileCache
Depends:
    R (>= 4.1.0),
    tidyverse,
    topGO
Suggests:
    knitr,
    BSgenome.Hsapiens.UCSC.hg19,
    BSgenome.Hsapiens.UCSC.hg38,
    BSgenome.Mmusculus.UCSC.mm10,
    BSgenome.Mmusculus.UCSC.mm9,
    TxDb.Hsapiens.UCSC.hg19.knownGene,
    TxDb.Hsapiens.UCSC.hg38.knownGene,
    TxDb.Mmusculus.UCSC.mm10.knownGene,
    TxDb.Mmusculus.UCSC.mm9.knownGene,
    org.Hs.eg.db,
    org.Mm.eg.db,
    IHW,
    biomaRt,
    clusterProfiler,
    ReactomePA,
    DOSE,
    ChIPseeker,
    testthat (>= 3.0.0),
    BiocStyle
VignetteBuilder: knitr
biocViews: Software, GeneExpression, GeneRegulation, NetworkInference, GeneSetEnrichment, BiomedicalInformatics, Genetics, Transcriptomics, ATACSeq, RNASeq, GraphAndNetwork, Regression, Transcription, ChIPSeq
License: Artistic-2.0
LazyData: false
URL: https://grp-zaugg.embl-community.io/GRaNIE
BugReports: https://git.embl.de/grp-zaugg/GRaNIE/issues
RoxygenNote: 7.1.2
Config/testthat/edition: 3
vjcitn commented 2 years ago
Quitting from lines 212-213 (workflow.Rmd) 
Error: processing vignette 'workflow.Rmd' failed with diagnostics:
cannot open file '/g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/PCA_sharedSamples_RNA.raw.pdf'
--- failed re-building ‘workflow.Rmd’

I am going to pass this package into single package build so that you can fix it. thanks for your submission.

chrarnold commented 2 years ago

Hi there, in the vignette, there is no explicit path being mentioned (I just checked again), I am a bit confused where the absolute path you posted here comes from?

vjcitn commented 2 years ago

we'll find out in testing locally i know it is not in vignette it has something to do with templating

lshep commented 2 years ago

FWIW I can also reproduce the error locally on my machine

Quitting from lines 212-213 (workflow.Rmd) 
Error: processing vignette 'workflow.Rmd' failed with diagnostics:
cannot open file '/g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/PCA_sharedSamples_RNA.raw.pdf'
--- failed re-building ‘workflow.Rmd’

SUMMARY: processing the following file failed:
  ‘workflow.Rmd’

Error: Vignette re-building failed.
Execution halted
chrarnold commented 2 years ago

Ok I will try this today on my local machine and see where it comes from! I am on it, thanks.

chrarnold commented 2 years ago

I know the reason now. In order to fix it, I'd like to query here what the maximum time is the vignette code is allowed to run.

lshep commented 2 years ago

I believe the time limit is 10 min for software packages but ideally far less than that. Remember it is often sufficient to use a small dummy dataset to show functionality and proof of principle.

chrarnold commented 2 years ago

I pushed a new version 0.99.1 to the Git today that results in 0 errors or warnings on an external machine with the newest R devel and Biocondutor 3.15. I hope the same is true for the build machines :). I optimized a few things and can summarize if needed. Thanks @lshep and @vjcitn for your input so far!

The issue with BiocFileCache and making sure it redownloads the example object only when needed is still ongoing, but not affecting the package review.

bioc-issue-bot commented 2 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/GRaNIE to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

chrarnold commented 2 years ago

Thank you for assigning a reviewer quickly here! I already checked the build report and uploaded a new version 0.99.2 to Github (not Bioconductor yet) which reduces the repo size a lot, I hope I am below the limit now - not sure I can decrease it further with little effort, but let's see what the next build report says. I am not sure but I have an idea where the Windows error comes from (I pushed a first attempt to fix it but not sure about it), I do not have a Windows machine currently available to test it unfortunately. I am hoping this can be solved after the first comments have been laid out, hopefully.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: d0b1deb67edb0f2cddf050c3491bd6dd78141b4b

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/GRaNIE to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

chrarnold commented 2 years ago

I checked the new build report, and I noticed 2 things:

  1. Timeout for R CMD check after 15 minutes (I'd like to see what takes up the most time, but this is unfortunately not shown)
  2. The Package Source tarball exceeds Bioconductor size of 5 Mb (13 MB)

I do not know how to satisfy these with a reasonable effort.

  1. My package is quite large, has many functions and each of them, as required and recommended, with @examples to show the usage o the functions. Even with the small example dataset that I use, this takes 30 seconds for a function to run, and collectively adds up running time for R CMD build. Similarly for using testthat. I cannot use a much smaller test dataset because then, the resulting gene-regulatory network becomes meaningless and so tiny that no useful enrichment etc can be run on it, it would look bad in a vignette.
  2. Similarly, because I intended to show in the workflow and package details vignette all main output plots along with explanations, the images I am including in the HTMLs are just above 5 Mb and I cant really do much about it... It is literally over 50 or so png images, each with varying size. The "inst" folder in the tarball alone is therefore exceeding the 5 MB. I can completely remove the workflow vignette and put it only on our dedicated package website with a stable URL, but isnt this against the whole philosophy a bit?

Please advise what to do, I cant revamp the whole idea and package just because of these quite stringent criteria :(.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 197f7c94eb0d68284743301fa87a02299dc16631

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/GRaNIE to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

vjcitn commented 2 years ago

Hi, thanks for the submission, we really hope it is possible to review it. But you'll have to do some work. The 5MB limit is met by over 1000 packages submitted since the standard was imposed. You have 5+MB of png and that is quite uncommon. You might want to serialize a pdf with the detailed information. But the size limit is firm. As for timeout, again all packages have to meet that standard. There is a longtests option (https://bioconductor.org/developers/how-to/long-tests/) that may be useful to you. Consider applying to the developer mentoring program https://bioconductor.org/developers/new-developer-program/ ...

chrarnold commented 2 years ago

Thanks, I will try my best - but I am going on vacation for 3.5 weeks on Saturday, only returning mid April, so not sure acceptance of the package can be finalized by April 20th, lets see.

Maybe I can explain to you how we currently do it with the PNGs: The functions that produce plots produce a PDF file with multiple (often many) pages. In the detailed Workflow vignette, we run all main functions, and only show selected plots by including a PNG of the page of the output PDF. This contributes mainly to the large size. While it is possible to just plot to the currently active graphics device(and not to a PDF), if I did so in the Vignette, then the vignette would contain a lot of plots I dont want to show. Alternatively, I could not show any example plots, but this is again not really ideal and defeats the purpose a bit. I could decrease the resolution of the images much more, but then readability suffers. Do you have any other ideas how to go about it?

For the time limit: I managed to reduce it a bit, and now, a timeout occurs on only one of the machines, I presume no timeout can occur on any of the machines? Is the testing via testthat etc also part of the time limit? Thanks for the longtest option - I quickly checked it but not sure how much it helps: The running time comes mainly from the examples, while the "tests" contribute only a little. The description from the link you sent says: "The maximum amount of time that R CMD check is allowed to spend on each package is 40 min.". Is this really 40 minutes, it seems it is 15 according to the build reports?

vjcitn commented 2 years ago

The graphics problem you describe needs to be solved in another way. I don't have a specific solution at the moment, you might ask in slack on developer forum channel.

the longtests should behave as advertised; if you are doing too many examples, cut it down and provide code for users to run additional examples as needed.

chrarnold commented 2 years ago

Thanks, for the examples that is fine - I recall that 80% of the functions must have a running example (and I tried to follow), otherwise a Bioc warning is raised - but if this is more negotiable and the more time-consuming ones can be skipped that would be a start.

Is it possible, if ok with Bioc, to only include a "package details" Vignette (see hre: https://grp-zaugg.embl-community.io/GRaNIE/articles/packageDetails.html) as part o the repo and do the full Workflow vignette outside of the Bioc package (and provide a link for users so they can go to the website and see it there)? That would solve many problems and users would be happy too because I can include more of the output as part of the vignette:)

Sorry for the many discussions and troubles, just trying to find a good way that works for all involved ;)

vjcitn commented 2 years ago

Bioconductor submissions should achieve a certain amount of conventionality. This makes it easier for users to explore and understand what they are getting involved with. Many developers have made efforts to introduce function and data combinations that demonstrate package capabilities conveniently in the vignette, which typically includes extensive coverage of key functions in the package, that run in the available time. Please take the advice from the bioc-devel dialogue and try to meet Bioconductor standards for your package. I would say the "package details" vignette, while very nice and informative, does not meet the standard, and that meeting it would be good for both the software and the intended user base.

lshep commented 2 years ago

I think trying to split up the package or having a more minimal dataset in the package but then submitting an accompanied workflow package with the more extensive, detailed workflow vignette are two very good options mentioned on the bioc-devel conversation.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 8c502bd1c5fccbbcd136bade51c2e10189196c12

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/GRaNIE to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

chrarnold commented 2 years ago

Hi all, I finally had time to work on the issues, and I think they are resolved now. All plot functions can now plot to the currently active graphics device, page-specifically, eliminating the need to store any images in the repo. I also removed the other vignettes as suggested. The dataset used in the workflow is more minimal now, and it seems to complete within the allowed time frame. I am on vacation, and available on the 12th again for further work if needed. Thanks for the input so far, and sorry for the initial troubles @lshep .

vjcitn commented 2 years ago

I am going to accept this package. But please revise the vignette so that the abstract tells the reader WHAT IS ACTUALLY GOING TO BE DONE, SCIENTIFICALLY, FROM THE POINT OF VIEW OF SOMEONE WHO HAS NO IDEA WHAT "GRaNIE" IS. The introductory paragraph could benefit from similar elaborations.

bioc-issue-bot commented 2 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

vjcitn commented 2 years ago

and do not name your vignette "workflow.Rmd". Use a less generic name.

lshep commented 2 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/chrarnold.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("GRaNIE"). The package 'landing page' will be created at

https://bioconductor.org/packages/GRaNIE

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

chrarnold commented 2 years ago

Thanks a lot, we are happy about this! I will revise the vignette today before the freeze.

About the renaming: Would GRaNIE_workflow.RMd ? Not sure which other name reflects better of what the vignettes shows. Some packages such as DESeq just name the vignette as the package name, would that be more appropriate yes? Thanks!

vjcitn commented 2 years ago

the name you suggest is fine

On Mon, Apr 25, 2022 at 10:38 AM chrarnold @.***> wrote:

Thanks a lot, we are happy about this! I will revise the vignette today before the freeze.

About the renaming: Would GRaNIE_workflow.RMd ? Not sure which other name reflects better of what the vignettes shows. Some packages such as DESeq just name the vignette as the package name, would that be more appropriate yes? Thanks!

— Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/2556#issuecomment-1108662772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDI5QQLKWE4EC3JARTKNBLVG2U5LANCNFSM5QDO5VUQ . You are receiving this because you were assigned.Message ID: @.***>

-- The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance  HelpLine at http://www.partners.org/complianceline http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

chrarnold commented 2 years ago

OK thanks! Another quick question: The original Git repository, what do I do with it? In the future, I commit changes directly to the Bioconductor remotes and there is no need for the Git repository, correct? I'd like to avoid and remove redundancy and complexity so I am thinking about retiring the Git repo then. What are best practices here?

chrarnold commented 2 years ago

I pushed a new commit 5 minutes ago, but I do not see the automated "Received a valid push on git.bioconductor.org" from the bot. Is this because the issue was closed or because of another reason?

lshep commented 2 years ago

Yes once it's accepted and closed it doesn't build on demand. It will be reflected when it shows in the daily build report

chrarnold commented 2 years ago

OK thanks for the quick reply, let's hope there is no error. Locally all looked good and I changed only few things, there shouldnt be anything new :). I will wait for the report then thank you :)

lshep commented 2 years ago

No problem. Here is the main build check page for convenience https://www.bioconductor.org/checkResults/

chrarnold commented 2 years ago

Hi, I noticed that GRaNIE appears only in Bioconductor 3.16 (devel) in the reports but not in Bioconductor 3.15 (release). From my understanding, shouldnt it be in the release branch also, given it built without errors before the deadline on April 20th or so?

lshep commented 2 years ago

we investigated the issue and identified the cause. The recently accepted new packages should be available after the next build report posting. Sorry for the inconvenience.

chrarnold commented 2 years ago

I see perfect thanks for the quick reply!