Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

HiContactsData #2757

Closed js2264 closed 2 years ago

js2264 commented 2 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 years ago

Hi @js2264

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: HiContactsData
Title: HiContacts companion data package
Version: 0.99.0
Date: 2022-08-16
Authors@R: 
    person(given = "Jacques",
 family = "Serizay",
 role = c("aut", "cre"),
 email = "jacquesserizay@gmail.com")
Description: 
    Provides a collection of Hi-C files (pairs and (m)cool). These datasets can 
    be read into R and further investigated and visualized with the HiContacts 
    package. Data includes yeast Hi-C data generated by the Koszul lab from 
    the Pasteur Institute.
License: MIT + file LICENSE
URL: https://github.com/js2264/HiContactsData
BugReports: https://github.com/js2264/HiContactsData/issues
Depends:
    ExperimentHub
Imports: 
    BiocFileCache, 
    AnnotationHub
Suggests:
    HiContacts,
    testthat, 
    methods,
    knitr, 
    rmarkdown
biocViews: 
    ExperimentHub,
    ExperimentData, 
    SequencingData
Encoding: UTF-8
VignetteBuilder: knitr
LazyData: false
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.1
js2264 commented 2 years ago

FYI, I am planning on submitting HiContacts (https://github.com/js2264/HiContacts) as an AdditionalPackage, as soon as the data package has been assigned a reviewer (according to following guidelines: https://github.com/Bioconductor/Contributions/blob/master/CONTRIBUTING.md#submitting-related-packages).

vjcitn commented 2 years ago

Sorry for the delay

js2264 commented 2 years ago

Hi @vjcitn no worries for the delay. I guess we're now waiting for a reviewer to be assigned? Is there anything else I can do to make things go smoothly?

vjcitn commented 2 years ago

@lshep ... @js2264 already noted

Do not submit an AdditionalPackage with the line shown in step 3 until a review in progress tag has been added to your package and your first package receives a build report

Submit additional packages to the same issue. Do this by posting a comment containing a line like:

 AdditionalPackage: https://github.com/username/repositoryname
vjcitn commented 2 years ago

so you'll have to wait a bit until the ingestion occurs, often at week's end.

bioc-issue-bot commented 2 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContactsData to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

js2264 commented 2 years ago

AdditionalPackage: https://github.com/js2264/HiContacts

bioc-issue-bot commented 2 years ago

Hi @js2264, Thanks for submitting your additional package: https://github.com/js2264/HiContacts. We are taking a quick look at it and you will hear back from us soon.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: dfd940262c106e30b7177b95aaeeaf2073b2c402

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContactsData to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 1c5b038a3989a64aade2cd962bc4a8abdf6b02be

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContactsData to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 2 years ago

Hi Jaques @js2264

Thank you for your submission. Please see the review below for the data package. Feel free to respond via the comments.

Best, Marcel


HiContactsData #2757

DESCRIPTION

NAMESPACE

R

vignettes

bioc-issue-bot commented 2 years ago

Additional Package has been approved for building.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

js2264 commented 2 years ago

Hi Marcel @LiNk-NY,

Thank you for your review. I have addressed your comments in the coming series of commits. The only thing I do not quite understand is your reference to using BiocStyle for DESCRIPTION. Could you please clarify what changes you'd want me to do ? I have originally generated the DESCRIPTION from biocthis::use_bioc_description()and added useful entries to it.

DESCRIPTION

  • Looks good.
  • Consider using BiocStyle.

Thanks again, J

LiNk-NY commented 2 years ago

Hi Jaques, @js2264 I mean using BiocStyle for the vignette and adding it to the Suggests field in the DESCRIPTION. See here for more details. Best, Marcel

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 6a72314b73c7bd9bebe9a1dea548e262db6a5f2e

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: ca988e900ba08d4584c81939436216352b3d80be

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContactsData to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: bf75c79e0ed75b7d9fd9c3e8c10856cd41e8d50c

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 7b2c4ab6b6be3b5b056fe782ed70ef8abf7f3f2e

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 2 years ago

Hi Jacques, @js2264

Thank you for your submission. Please see the review of the software package below.

Best regards, Marcel


HiContacts #2757

You've put a lot of work into the package. It may be best to divide the package into two based on the functionality provided. One package could host all the visualization functionality and the other the computational work. This would allow users to choose the needed functionality and avoid the high number of dependencies. The design of the class is good but may need some pruning of slots and methods.

DESCRIPTION

NAMESPACE

vignettes

R

tests

It's good that you have tests. Consider adding tests for those with 0%.

covr::package_coverage()
HiContacts Coverage: 38.11%
R/APA.R: 0.00%
R/scalogram.R: 0.00%
R/parse.R: 4.62%
R/utils.R: 7.75%
R/contacts.R: 21.83%
R/checks.R: 40.00%
R/plotMatrix.R: 48.26%
R/ggthemes.R: 50.00%
R/Ps.R: 78.57%
R/arithmetics.R: 78.79%
R/cistrans.R: 95.65%
R/4C.R: 100.00%
js2264 commented 2 years ago

Hi Marcel @LiNk-NY

Thank you for your thorough review and useful comments. I have turned your comments into a check list. The items that are still unticked have further explanations below. Let me know if anything is not clear and what I should do (including additional required changes ?).

## DESCRIPTION

I used pkgndep and dstr to identify as many packages as possible to remove. I got rid of purrr, dropped BiocParallel (since it was only used in APA function which is not ready yet and requires significant work to finish it, so I'm currently developing it in another branch). I also rewrote a bit my code to remove zeallot. I believe all the other dependencies are definitely required.

## NAMESPACE

## Vignettes

I've corrected most of it, except for the internal check functions. I usually have check function names as snake_make to differentiate them from other functions. These check functions systematically return boolean values and are mostly used internally, at the beginning of functions. Is it ok to keep them in snake_make? Otherwise I can change them to camelCase, just let me know.

## R

I have made some changes when possible (e.g. in anchors(x)[[1]] -> anchors(x)[['first']]). But regarding contacts_list[[1]], I am not sure how to change that to something else. I've added a check to make sure that at least 2 elements are contained in the contacts_list argument. Do you think that is enough, or would you want additional changes?

## Minor:

I thought about using score name. However, AFAIK, score in BiocGenerics generally returns a vector. In HiContacts, scores returns a GInteractions object with, for each interaction, an associated score. All the functions in the package are built on scores() and it would be a lot of work to modify it to behave in a similar way to score(). I can make that change, of course, if you think it is a requirement, but since this was in your minor comments I thought I'd ask before starting all of this.

I don't think changing features to rowData/rowRanges would be appropriate. In this package, features really refers to genomic features (could be chromatin loops, TSSs, compartment borders, whatever the user wants). A user might be interested in analysing a contact matrix in the light of these features and the features slot is a way to "enrich" the contacts object with these. These genomic features are not necessarily related to the interactions themselves, and there is no direct correspondance between "rows" (i.e. interactions, here) and features, as it is the case in SummarizedExperiments and co. Finally, AFAIK, I can only see 2 contexts in which "features" is frequently used in Bioconductor: either GenomicFeatures package or "feature selection" (typically in scRNAseq); these are very specific contexts and I believe people studying chromatin architecture would not be mistaken. With that said, do let me know if you think "features" still needs to be renamed and I'll happily find something else!

The data.R documentation describes contacts objects shipped with HiContacts (see data folder). I thought I would put these objects in HiContacts rather than HiContactsData since they are contacts objects and thus require HiContacts. Should I move them to HiContactsData?

js2264 commented 2 years ago

Hi Marcel @LiNk-NY Just to let you know I have made some comments in the post above, there are few points that would need an input from you when you have a minute. Many thanks again!

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: c2d82ab89ad4464c64a9bd4a68861a14aa961e4c

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

js2264 commented 2 years ago

Another point to mention is that generating the vignette with BiocStyle leads to the package tarball exceeding the size requirements. BiocCheck has been running on my GHAs: in commit 702819c the vignette was generated with BiocStyle (BiocCheck results: here) and in the next commit (a8390e5), I only changed the vignette style back to default rmarkdown::html_output (BiocCheck results: here). I have tried with BiocStyle::pdf_document and tarball is still too big. I have tried reducing the number and resolution of the images displayed in the vignette, but still not small enough. How do people usually deal with such limitations? Thanks for your advices!

LiNk-NY commented 2 years ago
  • [ ] There is a mix of camelCase and snakecase use. Is there a difference between uses? Otherwise, it's best to keep the function names consistently named.

I've corrected most of it, except for the internal check functions. I usually have check function names as snake_make to differentiate them from other functions. These check functions systematically return boolean values and are mostly used internally, at the beginning of functions. Is it ok to keep them in snake_make? Otherwise I can change them to camelCase, just let me know.

Yes, that's okay to keep them as snake_case. Usually non-exported functions start with a . by convention.

  • [ ] _Where possible avoid using numeric indices, e.g., contacts_list[[1]] and use more robust named indices._

I have made some changes when possible (e.g. in anchors(x)[[1]] -> anchors(x)[['first']]). But regarding contacts_list[[1]], I am not sure how to change that to something else. I've added a check to make sure that at least 2 elements are contained in the contacts_list argument. Do you think that is enough, or would you want additional changes?

That would be good. Ideally, these elements would be named but it is not always possible.

  • [ ] scores is too similar to score in BiocGenerics, consider using the latter.

I thought about using score name. However, AFAIK, score in BiocGenerics generally returns a vector. In HiContacts, scores returns a GInteractions object with, for each interaction, an associated score. All the functions in the package are built on scores() and it would be a lot of work to modify it to behave in a similar way to score(). I can make that change, of course, if you think it is a requirement, but since this was in your minor comments I thought I'd ask before starting all of this.

It may be better to use a different name. One would expect the scores function to return a vector of values.

  • [ ] features usually refers to row wise annotations, to avoid confusion either use a different name or consider using something like rowData/rowRanges. The extractor returns a SimpleList but the replacement method accepts GRangesOrGInteractions ? This is not consistent.

I don't think changing features to rowData/rowRanges would be appropriate. In this package, features really refers to genomic features (could be chromatin loops, TSSs, compartment borders, whatever the user wants). A user might be interested in analysing a contact matrix in the light of these features and the features slot is a way to "enrich" the contacts object with these. These genomic features are not necessarily related to the interactions themselves, and there is no direct correspondance between "rows" (i.e. interactions, here) and features, as it is the case in SummarizedExperiments and co. Finally, AFAIK, I can only see 2 contexts in which "features" is frequently used in Bioconductor: either GenomicFeatures package or "feature selection" (typically in scRNAseq); these are very specific contexts and I believe people studying chromatin architecture would not be mistaken. With that said, do let me know if you think "features" still needs to be renamed and I'll happily find something else!

features is quite a vague term to begin with. It would be good to think of another name that more accurately represents the data.

  • [ ] The data documentation belongs in the HiContactsData package.

The data.R documentation describes contacts objects shipped with HiContacts (see data folder). I thought I would put these objects in HiContacts rather than HiContactsData since they are contacts objects and thus require HiContacts. Should I move them to HiContactsData?

Perhaps I misunderstood this part of the package because I saw the @source HiContactsData tag. It seems to be duplicated data and the only difference is that contacts is run on the data. I would opt for only providing the code to run the contacts function on the data in the documentation somewhere rather than re-saving a different version of the original data, if I understand this correctly.

Best, Marcel

LiNk-NY commented 2 years ago

Another point to mention is that generating the vignette with BiocStyle leads to the package tarball exceeding the size requirements. BiocCheck has been running on my GHAs: in commit 702819c the vignette was generated with BiocStyle (BiocCheck results: here) and in the next commit (a8390e5), I only changed the vignette style back to default rmarkdown::html_output (BiocCheck results: here). I have tried with BiocStyle::pdf_document and tarball is still too big. I have tried reducing the number and resolution of the images displayed in the vignette, but still not small enough. How do people usually deal with such limitations? Thanks for your advices!

Hi Jacques, @js2264

This is likely due to the re-saving data step above. Remove the data duplicated from HiContactsData and only provide the code to run the contacts dataset (possibly showing this in the vignette is sufficient). This should reduce the size of the package tarball.

js2264 commented 2 years ago

Hi Marcel @LiNk-NY ,

Thanks for your input. I'll try to solve as many of the unticked boxes. For now:

LiNk-NY commented 2 years ago

I think topologicalFeatures sounds good. Note that functions / names that start with capital letters usually denote classes.

The method should follow the contract set forth by the generic. The score generic is an extractor for the "scores" column in the object, wherever that may be, e.g., in mcols or in a slot.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 01a199ab14efb02f9d1935e60eee797fca1955bc

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 39bd3c463c727226f59ef78c0f1016b84cc8d289

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

js2264 commented 2 years ago

Hi Marcel @LiNk-NY

I have pushed several commits to HiContacts package. To address your comments, I have:

Let me know if something remains to be re-worked.

LiNk-NY commented 2 years ago

Hi Jacques, @js2264

HiContactsData

HiContacts

Consider separating the plotting functionality from the computational one. It would lower the number of dependencies for those that only want to use the non-plotting functions.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 8e9d8c220d04c9e8654338a1961c48e83a76a9c9

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContactsData to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: af177ad1469a6d22703855f34454cf859451d660

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 9aeb92a351e1f14a7bcd13db2739ad5795aa8029

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

js2264 commented 2 years ago

Hi Marcel @LiNk-NY

Regarding splitting package in two,GenomicInteractions already imports ggplot2, so this is not really an extra dependency, as it will have to be installed anyway. For this reason, I believe that it is ok having HiContacts including plotting functions. Also, in my research experience, operations such as computing distance laws, 4C profiles, cis-trans ratios and other arithmetics of Hi-C experiments are never done without displaying corresponding Hi-C contact maps , which is why I strongly think this functionality should remain a part of HiContacts package.

What do you think? Thanks for your advices!