Bioconductor / Contributions

Contribute Packages to Bioconductor
133 stars 33 forks source link

TimeSeriesAnalysis #2883

Closed Ylefol closed 1 year ago

Ylefol commented 1 year ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 1 year ago

Hi @Ylefol

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: TimeSeriesAnalysis
Type: Package
Title: Transcriptomic TimeSeries (longitudinal) analysis
Version: 0.99.0
Authors@R: person("Yohan", "Lefol", email = "yohan.lefol@gmail.com",
        role = c("aut", "cre"))
Description: An pipeline to analyse and interpret longitudinal transcriptomic data from both RNAseq and microarray orgigins. The pipeline performs differential gene expression, clustering, and plots usefull figures to aid in interpretation of identified clusters of differentially expressed genes.
License: MIT
VignetteBuilder: knitr
Imports:
    stringr,
    DESeq2,
    limma,
    grid,
    ggrepel,
    ggplot2,
    ComplexHeatmap,
    reshape2,
    tibble,
    dplyr,
    gprofiler2,
    tictoc,
    GOSemSim,
    plotly,
    GO.db,
    data.table,
    htmltools,
    dynamicTreeCut,
    stringi,
    rstudioapi,
    AnnotationDbi,
    BiocGenerics,
    GenomicRanges,
    RColorBrewer,
    grDevices,
    htmlwidgets,
    stats,
    utils
Depends: 
    R (>= 4.1)
Suggests: knitr,
    rmarkdown,
URL: https://github.com/Ylefol/TimeSeriesAnalysis
LazyData: true
biocViews: Sequencing, RNASeq, Microarray, GeneExpression, Transcription,
    Normalization, DifferentialExpression, Clustering, Regression,
    PrincipalComponent
RoxygenNote: 7.2.3
Encoding: UTF-8
vjcitn commented 1 year ago

Thank you very much for this submission. I am going to ask you to consider some basic modifications before more review is undertaken. I find these in man/data.R:

#' \describe{
#'   \item{sample_dta}{File indicating which count files belong to which group, timepoint, and replicate}
#'   \item{counts}{A list of all the count files}
#' }
#' @source <https://www.github.com/Ylefol/TimeSeriesAnalysis>
"AID_TS_data"

#' @title RNAseq counts for AID activation cocktail in a murine timeseries exeriment time series experiment
#'
#' @description A data set with RNAseq results for a AID activation cocktail composed of LPS, CD40L, and TGFb-1.
#' Two groups exist: Control (no cocktail) and Treated (with activation cocktail).
#' Initial experiment harvested timepoints at 0, 15min, 30min, 1h, 2h, 3h, 6h, 12h, 24h, and 48h.
#' Only one replicate exists per group per timepoint, so subsequent timepoint of the same groups have been merged
#' as a singular timepoint. For example, timepoints 0 and 15m became timepoint 7.5 minutes.
#'
#' #' @format A list containing a dataframe and another list of count dataframes
#' \describe{
#'   \item{sample_dta}{File indicating which count files belong to which group, timepoint, and replicate}
#'   \item{counts}{A list of all the count files}
#' }

If I am not mistaken, this data would be well-suited to a SummarizedExperiment format. Have you considered this? The purpose is to bind quantitative assay data to metadata about experiments in a coherent way that is used by many Bioconductor packages. If you don't want to use SummarizedExperiment, that's fine, but then the package should probably be distributed in another way.

Ylefol commented 1 year ago

Thank you very much for this submission. I am going to ask you to consider some basic modifications before more review is undertaken. I find these in man/data.R:

#' \describe{
#'   \item{sample_dta}{File indicating which count files belong to which group, timepoint, and replicate}
#'   \item{counts}{A list of all the count files}
#' }
#' @source <https://www.github.com/Ylefol/TimeSeriesAnalysis>
"AID_TS_data"

#' @title RNAseq counts for AID activation cocktail in a murine timeseries exeriment time series experiment
#'
#' @description A data set with RNAseq results for a AID activation cocktail composed of LPS, CD40L, and TGFb-1.
#' Two groups exist: Control (no cocktail) and Treated (with activation cocktail).
#' Initial experiment harvested timepoints at 0, 15min, 30min, 1h, 2h, 3h, 6h, 12h, 24h, and 48h.
#' Only one replicate exists per group per timepoint, so subsequent timepoint of the same groups have been merged
#' as a singular timepoint. For example, timepoints 0 and 15m became timepoint 7.5 minutes.
#'
#' #' @format A list containing a dataframe and another list of count dataframes
#' \describe{
#'   \item{sample_dta}{File indicating which count files belong to which group, timepoint, and replicate}
#'   \item{counts}{A list of all the count files}
#' }

If I am not mistaken, this data would be well-suited to a SummarizedExperiment format. Have you considered this? The purpose is to bind quantitative assay data to metadata about experiments in a coherent way that is used by many Bioconductor packages. If you don't want to use SummarizedExperiment, that's fine, but then the package should probably be distributed in another way.

Hello, thank you for your feedback. This data could indeed benefit from being in a SummarizedExperiment format. I'll get to making those modifications and will update this issue once I've finalized and pushed them.

Best, Yohan

Ylefol commented 1 year ago

Hello @vjcitn, Thank you again for your suggestion. They have now been implemented in the following way: Each example dataset (3 of them) are now stored as SummarizedExperiments. In addition, the principal object of the pipeline (TimeSeries_Object) now takes in the either the example data as SummarizedExperiments or takes user submitted data and converts it to a SummarizedExperiment before implementing it within the TimeSeries_Object.

These changes have been pushed to the master branch.

Best regards, Yohan

Note: I accidentally closed the issue, and therefore re-opened it immediately after, my apologies

vjcitn commented 1 year ago

Please check your main branch.

Error: 'murine_TS_data' is not an exported object from 'namespace:TimeSeriesAnalysis'
recover called non-interactively; frames dumped, use debugger() to view
1/11 packages newly attached/loaded, see sessionInfo() for details.
* checking for file 'TimeSeriesAnalysis/DESCRIPTION' ... OK
* preparing 'TimeSeriesAnalysis':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building ‘TS_analysis_PBMC.Rmd’ using rmarkdown
Quitting from lines 197-202 (TS_analysis_PBMC.Rmd) 
Error: processing vignette 'TS_analysis_PBMC.Rmd' failed with diagnostics:
Could not resolve host: biit.cs.ut.ee
--- failed re-building ‘TS_analysis_PBMC.Rmd’

--- re-building ‘TS_analysis_celegans.Rmd’ using rmarkdown
Quitting from lines 36-106 (TS_analysis_celegans.Rmd) 
Error: processing vignette 'TS_analysis_celegans.Rmd' failed with diagnostics:
there is no package called 'org.Ce.eg.db'
--- failed re-building ‘TS_analysis_celegans.Rmd’

--- re-building ‘TS_analysis_murine.Rmd’ using rmarkdown
Quitting from lines 114-125 (TS_analysis_murine.Rmd) 
Error: processing vignette 'TS_analysis_murine.Rmd' failed with diagnostics:
object 'murine_TS_data' not found
--- failed re-building ‘TS_analysis_murine.Rmd’

SUMMARY: processing the following files failed:
  ‘TS_analysis_PBMC.Rmd’ ‘TS_analysis_celegans.Rmd’
  ‘TS_analysis_murine.Rmd’
vjcitn commented 1 year ago

I am trying to help move this through, but when I use purl to get the source of your vignette and run, I have

> ## ----create_base_TS, message=FALSE--------------------------------------------
> if(obj_name %in% list.files(name_result_folder)==FALSE){
+   TS_o .... [TRUNCATED] 
Error in TS_load_example_data(TS_object, "MURINE") : 
  could not find function "TS_load_example_data"

The vignettes should be independently runnable after the package has been installed.

Ylefol commented 1 year ago

Hello,

Apologies for the late response. I believe I can get most of these fixed by the end of the week (I'm a bit tied up in other work elements at the moment). My only concern would be the one pertaining to the host resolution. (Could not resolve host: biit.cs.ut.ee). This part of the code allows for the connection to gprofiler (https://biit.cs.ut.ee/gprofiler/gost) which is a necessity for the over-representation analysis within this pipeline. Usually the connection fails if the internet connection is not stable or if a firewall prevents the access to gprofiler. Could this have been the case for you? Perhaps this is a question that can be explored once I've resolved the other elements?

Thanks again for your continued help with this package. Best regards, Yohan Lefol

vjcitn commented 1 year ago

If it is going to fail intermittently it will fail for the build system and it will cause us to check the source of the failure. Please robustify the code -- if the connection succeeds, use it; if it fails, do something else, like use a "mock" result from the server. Let me know when this has been done.

vjcitn commented 1 year ago

Basically your package has to pass check for the build system, starting with the single package builder. Packages that I can't check myself get delayed. So you should focus on getting R CMD check to run on a clean system and not produce errors or bad warnings.

Ylefol commented 1 year ago

Hello @vjcitn

Thanks again for your continued help. I have now updated the master branch (details below) and it has passed the rcmdcheck function with no errors or concerning warnings. This check was performed on a separate labtop with a fresh R installation. On this new labtop, R was installed, followed by devtools, and then the required bioconductor packages (per the readme on the TimeSeriesAnalysis page). The TimeSeriesAnalysis package was then downloaded using devtools and it's repository cloned to a local file. From here I ran the rcmdcheck function.

In regards to the original issues, the source was that some of the data files had yet to be removed from the .gitignore file and therefore were not available on the git repo (this has been rectified).

As for the gprofiler, and therefore connection issues, I have done as recommended and created a catch where is vignettes or example data are being used, pre-created data will be loaded in the event of a connection failure. Four data files have been created to this end, one per vignette/example files (three in total) and a fourth for the examples contained in the Roxygen comments.

Several examples were also modified as these were causing errors due to my failing to update some of them following the addition of SummarizeExperiments.

Best regards, Yohan Lefol

vjcitn commented 1 year ago

I tried to build again on current source, on a very large machine. It did not complete. Your PBMC vignette opens a folder where it is being run and drops data in there. Writes should be made by default to a subfolder of tempdir(); the user can choose alternate destinations. I will pass the package so it can be reviewed. I think there is a risk of timeouts. You may want to precompute results for certain phases to reduce run time and use the long running test protocol to verify that the precomputes have not gotten stale. There are two spelling errors in the Description field of DESCRIPTION.

vjcitn commented 1 year ago

See http://contributions.bioconductor.org/r-code.html?q=tempdir#web-querying-and-file-caching -- It is not allowed to download or write any files to a users home directory, working directory, or installed package directory.

Ylefol commented 1 year ago

Hello,

Thank you for the comments. Indeed the processing time of some of the vignettes takes a long time and was a concern of mine. I'll come up with a work around for that. I'll also remove the writing to folder components from the Vignettes and utilize the tempdir() where necessary.

I'll follow-up on this issue once I've made these adjustments and pushed them to the master branch.

Best regards, Yohan

lshep commented 1 year ago

If we process this, it will try to write to the home directory on the builders which would not be ideal as then we would have to manually clean up. Please let us know when this is resolved and then we will proceed with processing on our builders and assigning a reviewer.

Ylefol commented 1 year ago

Hello @vjcitn and @lshep ,

I have now made several modifications which in essence remove all 'save to directory' elements. A few of the functions within the package exist to save elements to a directory. As such, the examples for these have simply been removed as to not create problems when building examples. Other sets of functions which write to directory have been adjusted to use BiocCache within their examples. The Vignettes do not save any files to directory. I've also re-run R CMD Check on a separate system. It takes roughly 20 minutes, no errors were detected.

Please let me know if there is anything else that should be done.

Best regards, Yohan Lefol

vjcitn commented 1 year ago

Thanks for your efforts. The package builds to about ~19MB and it needs to be 5MB at most. The three vignettes are nice but each compiles to around 5MB. I suggest you move two of the vignettes to inst/extras and tell the user about them in the one vignette that you retain in vignettes/. You will have to reduce the image sizes to get down to 5MB. Use BiocCheck::BiocCheck to confirm that your package can be reviewed.

Ylefol commented 1 year ago

Hello,

Sounds good, I had a concern that it was going to be a bit too heavy for bioconductor. My understanding had been that no individual file can go beyond 5MB but not the build as a whole. That's good to know.

I have a few other things to attend to but I hope to getting these modifications done by sometime next week. Thanks again for your continued help.

Best regards, Yohan Lefol

vjcitn commented 1 year ago

With the latest R 4.3 on linux I am getting

* creating vignettes ... ERROR
--- re-building ‘TS_analysis_PBMC.Rmd’ using rmarkdown
Warning in grid.Call.graphics(C_lines, x$x, x$y, index, x$arrow) :
  semi-transparency is not supported on this device: reported only once per page
--- finished re-building ‘TS_analysis_PBMC.Rmd’

--- re-building ‘TS_analysis_celegans.Rmd’ using rmarkdown
Quitting from lines 230-247 (TS_analysis_celegans.Rmd) 
Error: processing vignette 'TS_analysis_celegans.Rmd' failed with diagnostics:
The size of the temporary image for rasterization is too huge (2329 x
198 px) that it is cannot be handled by the device function
`Cairo:CairoPNG()`. Please reduce the maximal size of temporary image
by setting proper values for `ht_opt$raster_temp_image_max_width` and
`ht_opt$raster_temp_image_max_height`.
--- failed re-building ‘TS_analysis_celegans.Rmd’

--- re-building ‘TS_analysis_murine.Rmd’ using rmarkdown
Quitting from lines 330-349 (TS_analysis_murine.Rmd) 
Error: processing vignette 'TS_analysis_murine.Rmd' failed with diagnostics:
The size of the temporary image for rasterization is too huge (91 x
2851 px) that it is cannot be handled by the device function
`Cairo:CairoPNG()`. Please reduce the maximal size of temporary image
by setting proper values for `ht_opt$raster_temp_image_max_width` and
`ht_opt$raster_temp_image_max_height`.
--- failed re-building ‘TS_analysis_murine.Rmd’

SUMMARY: processing the following files failed:
  ‘TS_analysis_celegans.Rmd’ ‘TS_analysis_murine.Rmd’

Error: Vignette re-building failed.
Execution halted
Ylefol commented 1 year ago

Hello @vjcitn and @lshep ,

I've now managed to get the build down to 5MB, which also has solved the issue just posted.

I removed two of the three vignettes (murine and celegans). All three vignettes are quite similar so not much is lost by removing two of them. For the remaining Vignette, interactive plots were shifted to static plots. This was done as having a single plotly based plot cause a 3.7Mb folder to be created within the build of that Vignette. The loss of interactivity is a shame, but it could not be kept all while remaining below the 5MB limit. In addition, two plots were removed in order to save on build size. These various changes caused a bit of a change in the code as well, hence all examples/references were double checked. The build size is 4.9MB, I also ran Rcmdcheck and BiocCheck on a separate computer. Both passed without errors or concerning warnings. The computer was linux based.

All of these changes have now been pushed to the master branch. Please let me know if any additional changes are needed.

Best regards, Yohan Lefol

vjcitn commented 1 year ago

Well what I see at 0ac6803e91c5e7c53cea7a36941ccd84d019c99f is a 9+MB build with version 01.01.01 ... Please fix the version number and I will clear the package. The version number should be 0.99.0 or some other value for the final digit

Ylefol commented 1 year ago

Apologies, I had mistakenly updated the version. This has now been corrected (set back to 0.99.0. The reason for which the build order was ~9MB is that I mistakenly added an archive folder to the repo. This folder contained everything that I had removed to reduce the size of the build. I removed this with the latest push. It should be 4.9MB now.

Best regards, Yohan Lefol

lshep commented 1 year ago

We are not able to clone onto our system do do merge conflicts in some of your files

ocs/reference/TS_load_example_data.html:2:<<<<<<< HEAD
docs/reference/TS_load_example_data.html:30:<<<<<<< HEAD
docs/reference/TS_load_example_data.html:46:<<<<<<< HEAD
docs/reference/exp_matrix.html:2:<<<<<<< HEAD
docs/reference/exp_matrix.html:30:<<<<<<< HEAD
docs/reference/exp_matrix.html:46:<<<<<<< HEAD
docs/reference/add_experiment_data.html:2:<<<<<<< HEAD
docs/reference/add_experiment_data.html:31:<<<<<<< HEAD
docs/reference/add_experiment_data.html:47:<<<<<<< HEAD
docs/reference/add_experiment_data.html:141:<<<<<<< HEAD
docs/reference/exp_sample_data.html:2:<<<<<<< HEAD
docs/reference/exp_sample_data.html:31:<<<<<<< HEAD
docs/reference/exp_sample_data.html:47:<<<<<<< HEAD
docs/reference/exp_sample_data.html:92:<<<<<<< HEAD

These must be resolved before we can proceed

Ylefol commented 1 year ago

Hello @lshep ,

My apologies, I didn't initially catch those files as having a merger error. I have now resolved this problem and pushed the fix to the master branch.

Best regards, Yohan Lefol

bioc-issue-bot commented 1 year ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/TimeSeriesAnalysis to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: d7af13e17ef4e7569b47d7be669a7a5dc1b38193

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/TimeSeriesAnalysis to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 57c13576bfc46202be0ddb109cea4c392335016f

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/TimeSeriesAnalysis to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

DarioS commented 1 year ago

Your package has warnings. Could you please resolve those before the formal review begins? Only NOTEs are permitted. If stuck, please consider asking on the bioc-devel@r-project.org mailing list or r-package-devel@r-project.org if non-specific.

Ylefol commented 1 year ago

Hello,

Through the attempts of implementing this on bioconductor, I've come to see that I may not be able to do the necessary modifications while preserving the essence of this package. This was initially designed as a simple to use and simple to understand package. As such, it requires some detailed tutorials on the various aspects of the pipeline while showing the different figures that it may produce (some of which are interactive). After attempting several things, I cannot seem to satisfy both the limited size for bioconductor packages as well as the build-time limitations all while preserving the simplicity of the package.

All of this is to say that I will not be pursuing this submission, as such, I will be closing this issue with this comment.

None the less, I appreciate the help throughout the submission process.

Best regards, Yohan Lefol

vjcitn commented 1 year ago

We can probably help achieve your aims if you would like. There are ways of reducing package tarball size and managing check times. Let us know if you would like more intensive advice. We will leave the issue closed unless we hear from you; there is a conference going on this week, so additional commentary will need to wait a bit if desired.