(inactive) ChromENVEE - Githubissues

ManonCoulee commented 2 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Repository: https://github.com/ManonCoulee/ChromENVEE

Confirm the following by editing each check box to '[x]'

[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.

I am familiar with the essential aspects of Bioconductor software management, including:

[x] The 'devel' branch for new packages and features.
[x] The stable 'release' branch, made available every six months, for bug fixes.
[x] Bioconductor version control using Git (optionally via GitHub).

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 years ago

Hi @ManonCoulee

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: ChromENVEE
Title: Chromatin Environment and Enhancer-dependent Expression
Version: 0.99.8
Authors@R:
    c(
      person(given = "Manon", family = "Coulee", role = c("aut", "cre"), email = "manoncoulee@hotmail.com"),
      person(given = "Guillaume", family = "Meurice", role = "aut", email = "guillaume.meurice@aphp.fr"),
      person(given = "Mitra", family = "Barzine", role = "ctb", email = "mitra.barzine@inserm.fr"),
      person(given = "Laila", family = "El Khattabi", role = "aut", email = "laila.el-khattabi@aphp.fr"),
      person(given = "Julie", family = "Cocquet", role = "aut", email = "julie.cocquet@inserm.fr")
    )
Description: ChromENVEE is a package developed to study chromatin states.
  This package implements functions to associate all the neighbouring genes to a list of enhancers
  and to define the chromatin environment of genes using chromatin states informations
  (e.g., ChromHMM output). Several visualization functions are available to summarize the
  distribution of chromatin states, characterize genes associated with enhancers and also assign
  chromatin environment to genes.
License: GPL-3
Encoding: UTF-8
LazyData: true
LazyDataCompression:xz
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.1
Depends:
    R (>= 3.6.0)
Suggests:
    rmarkdown,
    knitr,
    testthat (>= 3.0.0)
Config/testthat/edition: 3
VignetteBuilder: knitr
Imports:
    ggplot2,
    GenomicRanges,
    parallel,
    stringr,
    umap,
    stats,
    methods
biocViews: Annotation

vjcitn commented 1 year ago

Some of your GRanges in the vignette have "unspecified genome" but you should set that, especially as you are using mouse. Should the gene ranges and hmm state data given at the start of the vignette be GRanges?

ManonCoulee commented 1 year ago

Dear Vince,

Thank you for your comment.

It is noted for the GRange data, I will add the information on the genome

For the other data ranges, it would be more efficient and it would avoid eventual errors due to a bad naming of the columns.

I'll keep you informed when the modifications have been made

Manon

ManonCoulee commented 1 year ago

Dear Vince,

We have updated our vignette by adding the information about the genome studied in the GRange objects. The functions have been modified to be able to have GRanges objects as input. https://github.com/ManonCoulee/ChromENVEE

Manon

bioc-issue-bot commented 1 year ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ChromENVEE to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: f54ed8be1a8ce7ed895ad3856168f2d9610baab5

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ChromENVEE to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 4e869d3ebebbdc80c41ec8e9b91d874e20eba68d

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ChromENVEE to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

DarioS commented 1 year ago

ChromENVEE is an R package for the exploration of ChromHMM epigenomic segmentation results. The issues which I found are:

Much of the package's functionality simply implements what is already available in Bioconductor in an ad-hoc and inefficient way. A couple of examples of this are:
1. One of the main functionalities is associating enhancers to nearby genes. This is done in a clunky way by manually calculating overlaps using a plain data frame containing starts and ends:
```
enhancerTable$start_500kb = GenomicRanges::start(enhancerTable) - interval
enhancerTable$end_500kb = GenomicRanges::end(enhancerTable) + interval
...        ...
genome$TSS = GenomicRanges::start(genome)
genome[GenomicRanges::strand(genome)@values == "-", ]$TSS = end(genome[GenomicRanges::strand(genome)@values == "-", ])
...        ...
tt = genome[genome$TSS < start,]
sub_genome = tt[tt$TSS > start_500kb,]
...        ...
```
  This can be done in a couple of lines of code if existing Bioconductor infrastructure is used. To make a large window around enhancers is as simple as resize(enhancersRanges, width(enhancersRanges) + 2 * flankWidth, fix = "center") and to find overlapping enhancers and genes is findOverlaps(enhancersRanges, genesRanges). Plese carefully read the GenomicRanges vignette to familiarise youself with one of the most widely used Bioconductor packages.
2. genomeFile is a database of mouse transcripts in the data folder. However, this could simply be replaced by an annotation offered by Bioconductor. The gene database can easily be imported as:
```
library(AnnotationHub)
hub <- AnnotationHub()
query(hub, c("ensdb","musculus")) # Shows list of mouse annotations. Choose one.
transcriptomeMouse <- hub[["AH109367"]]
# Then, extract the gene coordinates.
```
  AnnotationHub is another core package developed by Bioconductor employees and another example of ChromENVEE's poor interoperability with existing packages.
Similarly, ChromHMM segmentation from GENCODE should simply be obtained from annotatr Bioconductor package, further reducing the number and size of RData files within ChromENVEE'.
The vignette should explain the mandatory column names of input data and required variable data types rather than just loading pre-made data without any explanation of its structure.
Some key analyses offered are not meaningful. For example,

To determine which genes are associated to which enhancers, we assign to each enhancer all the genes located within an interval.

This is not how enhancers work and why 5C and Hi-C assays were developed to understand which interactions are happening in real biology. The vignette refers to Ferrari, F. et al. and Godfrey, L. et al. I read both and none of these used such a biologically-unrealistic gene-enhancer linking strategy. So, why is it coded in the package? Also,

predominantState() estimates the predominant chromatin state at gene promoter, which corresponds to the state with the largest overlap with the gene promoter environment. Genes are then clustered according to their chromatin state using UMAP.

So, are you proposing to use UMAP on categorical data (i.e. the predominant state at each promoter). However, the developer of UMAP says that the functionality for non-numeric data is still under development. Are the functions in ChromHMM mathematically correct? Have any of the fuctions already being used in a peer-reviewed journal article? I am not confident about the validity of a few of the analyses presented in the vignette.
The package appears to only support results output from ChromHMM. ChromHMM is just one algorithm from a family of Segmentation And Genome Annotation (SAGA) algorithms, which are used to understand genome activity and gene regulation. There are a few other SAGA software (e.g. HMMSeg, Segway). Hence, the package is provides limited data import for SAGA algorithms and lacks the ability to compare between the..
The vignette mentions

ChromHMM R package allows to go further by predicting chromatin states using ChIPSeq datasets for several histone marks.

However, ChromHMM is not an R package but a Java JAR file. The vignette is not clearly written for people not familiar with epigenomics and incorrect information such as this adds to the reader confusion.

There are also other issues, such as the variable naming, use of space and calculated value assignments to a variable not conforming to the coding style guide in the Contributions document.

Use <- instead of = for assigning to variables, except in function arguments. Variable names: Use camelCase: initial lowercase, then alternate case between words. Always use space after a comma. This: a, b, c.

Also, please don't use variable names such as tt, p, E, S, which are very short and lack meaning to people other than the writer of the code. Longer variable names comprised of meaningful words improve maintainability of code. See self-documenting code for more information.

The Description section of each function is only one short sentence and there is no Details section. Please see examples of function documentation in packages such as GenomicRanges and edgeR of the expected level of detail in documentation.

In summary, the assumptions of ChromENVEE are biologically unrealistic and it often redundantly and inefficiently reimplements functionality of existing Bioconductor core infrastructure packages, as well as having a coding style which does not conform to the Bioconductor contributor's guide.

bioc-issue-bot commented 1 year ago

This issue is being closed because there has been no progress for an extended period of time. You may reopen the issue when you have the time to actively participate in the review / submission process. Please also keep in mind that a package accepted to Bioconductor requires a commitment on your part to ongoing maintenance.

Thank you for your interest in Bioconductor.

Bioconductor / Contributions

(inactive) ChromENVEE #2864