Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

Pedixplorer #3203

Closed LouisLeNezet closed 9 months ago

LouisLeNezet commented 1 year ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 1 year ago

Hi @LouisLeNezet

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: Pedixplorer
Version: 0.99.0
Date: 2023-09-19
Title: Pedigree Functions
Authors@R: c(
    person("Louis", "Le Nézet", email="louislenezet@gmail.com",
        role=c("aut","cre"), comment = c(ORCID = "0009-0000-0202-2703")),
    person("Jason", "Sinnwell", email="sinnwell.jason@mayo.edu", role="aut"),
    person("Terry", "Therneau", role="aut"),
    person("Daniel", "Schaid", role="ctb"),
    person("Elizabeth", "Atkinson", role="ctb"),
    person("Louis", "Le Nezet", role='ctb'))
Depends: 
    R (>= 4.3.0)
Imports: 
    graphics,
    stats,
    methods,
    ggplot2,
    utils,
    grDevices,
    stringr,
    plyr,
    dplyr,
    tidyr,
    quadprog,
    Matrix
Description: Routines to handle family data with a pedigree object. The initial purpose
    was to create correlation structures that describe family relationships such 
    as kinship and identity-by-descent, which can be used to model family data 
    in mixed effects models, such as in the coxme function. Also includes a tool
    for pedigree drawing which is focused on producing compact layouts without 
    intervention. Recent additions include utilities to trim the pedigree object
    with various criteria, and kinship for the X chromosome.
License: Artistic-2.0
Encoding: UTF-8
RoxygenNote: 7.2.3
Roxygen: list(markdown = TRUE)
VignetteBuilder: knitr
Suggests: 
    testthat (>= 3.0.0),
    diffviewer,
    vdiffr,
    rmarkdown,
    BiocStyle,
    knitr,
    withr
Config/testthat/edition: 3
biocViews: Software, DataRepresentation, Genetics, Alignment
BugReports: https://github.com/LouisLeLezet/Pedixplorer/issues
url: https://github.com/LouisLeNezet/Pedixplorer
BiocType: Software
Collate: 
    'Pedixplorer-package.R'
    'alignped4.R'
    'alignped3.R'
    'alignped2.R'
    'alignped1.R'
    'pedigree.R'
    'validity.R'
    'pedigreeClass.R'
    'check_hints.R'
    'kindepth.R'
    'auto_hint.R'
    'align.R'
    'best_hint.R'
    'bit_size.R'
    'data.R'
    'descendants.R'
    'make_famid.R'
    'family_check.R'
    'kinship.R'
    'utils.R'
    'find_unavailable.R'
    'find_avail_affected.R'
    'find_avail_noninform.R'
    'fix_parents.R'
    'generate_aff_inds.R'
    'generate_colors.R'
    'ibd_matrix.R'
    'is_informative.R'
    'min_dist_inf.R'
    'norm_data.R'
    'num_child.R'
    'ped_to_legdf.R'
    'ped_to_plotdf.R'
    'plot_fct.R'
    'plot_fromdf.R'
    'plot.R'
    'shrink.R'
    'trim.R'
    'unrelated.R'
    'useful_inds.R'
bioc-issue-bot commented 1 year ago

Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: macOS 12.6.5 Monterey: Pedixplorer_0.99.0.tar.gz Linux (Ubuntu 22.04.2 LTS): Pedixplorer_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Pedixplorer to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.

hpages commented 1 year ago

Thanks Louis for the submission.

As mentioned in the community Slack, the [[ and $ methods for Pedigree objects are just aliases for slot(), which means that using them is not really different from using direct slot access. This is not the Bioconductor way for S4 object manipulation. As I explained in the community Slack, Bioconductor has a strong preference for dedicated accessor functions. This approach provides many advantages over direct slot access, which is why there's a strong consensus about it among the Bioconductor community. As a consequence, this is also what the Bioconductor users are used to, and what they are confortable with.

About subsetting: why does this work

library(Pedixplorer)
data("sampleped")
ped <- pedigree(sampleped)
ped[ped$ped$family == "1"]

but not this?

ped[ped$ped$gender == 1]  # error!

(BTW why is the vignette using the 2D subsetting syntax for this (e.g. ped[ped$ped$family == "1", ])?)

Even subsetting based on family appartenance doesn't seem to work reliably:

ped$ped$family[1:5] <- "3"
ped[ped$ped$family == "3"]  # error!

Also the way that subsetting works strongly suggests that Pedigree objects have a vector semantic, where the vector elements are the individuals. So a natural thing to do is to define a length() method that returns the length of the vector (i.e. the number of individuals in it). Right now length() returns 1 on these objects, which is confusing.

These are major issues with the current implementation of the Pedigree class. I understand that adressing them might require some substantial refactoring, and that the next iteration of the class and its API that will result from this refactoring might be quite different. So I'll wait before digging more deeply into this and provide more extensive feedback.

Just a couple of other (minor) things that caught my attention:

Thanks, H.

LouisLeNezet commented 1 year ago

Hi @hpages,

Thanks a lot for the explanation.

I will try to answer to each point:

  1. Bioconductor has a strong preference for dedicated accessor functions
  1. Subsetting gender
    ped[ped$ped$gender == 1]  # error!

It doesn't work because if you select only the male in the pedigree object all the mother would not be present while still being in the momid column. What could be done would be do subselect the ped_df and then use fix_parents() ?

  1. Subsetting family
    ped$ped$family[1:5] <- "3"
    ped[ped$ped$family == "3"]  # error!

It's the same problem as above, when you subset like that the individuals 1_135 & 1_136 are parents of the 3rd individuals but are not present in the subset. What would work would be:

 ped$ped$family[c(1:5, 35, 36)] <- "3"
 ped[ped$ped$family == "3"]  # no error
  1. Length()
  1. Pedigree and pedigree()
  1. Vignettes
  1. Table of contents
LouisLeNezet commented 1 year ago

Concerning the setter and getter should I do something like the following:

#' @title Pedigree ped accessors
#' @param object A Pedigree object.
#' @return The slot `ped` present in the Pedigree object.
#' @rdname extract-methods
#' @aliases ped,Pedigree-method
#' @export
setGeneric("ped", function(object){
    standardGeneric("ped")
})

setMethod("ped", signature(object = "Pedigree"), function(object) {
    object@ped
})

setGeneric("ped<-", function(object, value) {
    standardGeneric("ped<-")
})

setMethod("ped<-", signature(object = "Pedigree", value = "ANY"), function(object, value) {
    object@ped <- value
    validObject(object)
    object
})

By doing so, the following works:

testthat("Pedigree setters", {
    data("sampleped")
    ped <- Pedigree(sampleped)
    peddf <- ped(ped)
    peddf[1, "family"] <- "2"
    ped(ped) <- peddf

    expect_equal(ped(ped)[1, "family"], "2")
})
bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 460e819c47cac935a395c0818f37700bc36e79ae

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: ERROR before build products produced.

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Pedixplorer to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

hpages commented 1 year ago
ped$ped$family[1:5] <- "3"
ped[ped$ped$family == "3"]  # error!

It's the same problem as above, when you subset like that the individuals 1_135 & 1_136 are parents of the 3rd individuals but are not present in the subset.

Sure, I broke the family trees when I did ped$ped$family[1:5] <- "3". I did it on purpose. My point is that operations that break the object should be blocked and display a clear error message.

A major difficulty with the current design of the class and its API is that it's very hard for the user to know what modifications of the object are allowed. Providing a well-defined set of getters/setters should help with this.

For example, right now all columns in the data.frame stored in the ped slot have the same level of "importance", at least in appearance. However, it seems that there are at least 5 groups:

This means all these columns cannot be treated the same way. Does it make sense to expose all of them? What columns is the user allowed to drop or modify? For example it seems that letting the user touch anything in the essential columns should not be allowed. Also if group 2 & 3 are automatically handled internally by high-level operations during the typical Pedigree workflow, then maybe the user shouldn't be allowed to directly modify them either. Maybe the user should only be allowed to modify the metadata columns?

Given these fundamental differences between the 4 groups of columns, wouldn't it make more sense to store them in separate data.frames? At least separate the matadata columns (user-controlled) from the other columns. Note that an established convention in Bioconductor is to return the medata columns in a DataFrame object.

Also note that the show() method doesn't display much so the user needs to explicitely access the data.frame in the ped slot in order to "see" the data. However this approach doesn't make any distinction between the different groups of columns. This means that the user needs to remember about what column belongs to what group in order to not mess up the object. This puts an unreasonable cognitive burden on the user. Plus the initial order of the columns (supplied at construction time) is not respected (the columns seem to have been shuffled), which doesn't help either.

About subsetting: it seems that only subsetting that is guaranteed to not break the family trees should be allowed. This means that providing general subsetting via [ is not really appropriate. Maybe it would make more sense to provide a specialized subsetting operation like subsetByFamily() that takes the object and a vector of family ids. Or are there other forms of subsetting that should be supported?

So to summarize you want to think hard about the API of your objects. A typical API consists of length() + a set of getters and setters + maybe some kind of subsetting operation + maybe c() to combine various objects. The API should only allow operations that make sense, and block illegal operations with clear error messages.

Hope this helps,

H.

LouisLeNezet commented 1 year ago

Hi, thanks for the review.

What do you think ?

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 1aa7197074c0f2ff73e7a6f758ade534c3a512c4

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: Linux (Ubuntu 22.04.2 LTS): Pedixplorer_0.99.2.tar.gz macOS 12.6.5 Monterey: Pedixplorer_0.99.2.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Pedixplorer to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

hpages commented 1 year ago

Hi @LouisLeNezet,

There's a lot going on with object and API design. I'm well aware that this is not an easy topic. In particular, a major challenge is to come up with an API for your objects that supports the operations encountered in the typical workflows, in an intuitive way, that is agnostic of internal representation, that adheres to the principle of least surprise, that handles user errors in a helpful and user-friendly way etc... For example, as mentioned previously, this operation should fail with a useful error message:

data("sampleped")
ped <- Pedigree(sampleped)
ped$ped$family[1:5] <- "3"

and this one too:

descendants(c("1_101", "2_208", "3_111"), ped)

but they don't.

On top of this, when choosing S4 to implement objects and their API, there's the additonal burden of learning S4 itself. There's also the overhead of getting familiar with some other widely used objects in Bioconductor, like SummarizedExperiment, SingleCellExperiment, GInteractions etc... to get an idea of how things are usually done, and to fit in the ecosystem.

To make things even more challenging, your Pedigree objects are quite complicated, and I'm not familiar with the topic so can only provide limited guidance in the form of general statements. Furthermore, I don't know the history of the old pedigree objects from the kinship2 package, or what their shortcomings are that motivated the decision to replace the old class with this new class in the first place. Providing more concrete guidance would require that I invest a significant amount of time. Ultimately the package review is not the right venue for this level of involvement.

All this to say that I will now stop commenting on the design of Pedigree objects and their API. Hopefully they are already an improvement over the old pedigree objects. And getting them in Bioconductor will hopefully give them the kind of exposure that will help gathering feedback and refine them in the future (even though incremental refinements won't be easy if too much of their internal representation is exposed to the user, which is exactly why separating the API from internal representation is so important).

Here is some additional feedback for the rest of the package, mostly cosmetic:

That will be all.

Thanks again for your work on this, H.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: d4709098ce684dfef269877b592755aec7949862

lshep commented 1 year ago

Hello, we are in the middle of updating the SPB for the current release. when I am done with the updates I will manually kick off a new build report for you. sorry for the inconvenience

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: Linux (Ubuntu 22.04.2 LTS): Pedixplorer_0.99.3.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Pedixplorer to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 9dfad8318b170f9457ca02a8808f1c9efe2c70d2

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.2 LTS): Pedixplorer_0.99.4.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Pedixplorer to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: a3c80e34fd4d9c5e15947f2523fba449ce17820f

LouisLeNezet commented 1 year ago

Hi @hpages,

Thanks a lot for your feedback, it helped me a lot to redesign the API. It took me some time, but I should have now something more close to the bioconductor standard, I hope.

The biggest change is on the structure of the S4 object. The Pedigree object contains now the following object:

I've created the a lot of dedicated accessors but I'm not sure that is what I should do when there is so many slots. Maybe a wrapper like ped(x, slot) should be better ?

Also for the Pedigree object the famid() accessor is a wrapper of the famid(ped(x)) and this is the same for the mcols as the usage of this two accessors should reflect the individuals of the Pedigree, from my point of view.

I've added more use case in the vignettes, but that might be to little (when you want to go in details), and maybe too much for the main vignette. Should I separate it more ?

Thanks for your review !

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.2 LTS): Pedixplorer_0.99.5.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Pedixplorer to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lshep commented 11 months ago

I'm very sorry for the delay; it should not be taking this long. I will check in with @hpages if he plans to revisit this review soon or reassign the reviewer if necessary

hpages commented 11 months ago

Hi @LouisLeNezet, sorry for the delay. I'll take another look at this in the next few days. Thanks for your patience.

vjcitn commented 10 months ago

I am looking at it now.

vjcitn commented 10 months ago

I think most of the concerns have been addressed well. I would say that the current appearance of the vignettes is suboptimal because none is explicitly identified as the 'tutorial' or 'introduction' that should be visited first by a newcomer. image

Please use the internal title field to help visitors navigate. Once this is done I will accept the package.

bioc-issue-bot commented 10 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: d40e20aaafcc7524261e8f903680fa1cdcba22a6

bioc-issue-bot commented 10 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.2 LTS): Pedixplorer_0.99.6.tar.gz macOS 12.7.1 Monterey: Pedixplorer_0.99.6.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Pedixplorer to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

vjcitn commented 10 months ago

@LouisLeNezet will you be able to deal with the suggestion above?

LouisLeNezet commented 10 months ago

Hi, It should have been done in the last push. Is it not the case ? The new title for the "pedigree.rmd" is "Pedigree tutorial" as asked.

bioc-issue-bot commented 10 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: f6225502a70f7da1b9a04d73df32db77d43c0d7c

LouisLeNezet commented 10 months ago

Sorry I forgot to modify the VignetteIndexEntry{} This is done now.

bioc-issue-bot commented 10 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): Pedixplorer_0.99.7.tar.gz macOS 12.7.1 Monterey: Pedixplorer_0.99.7.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Pedixplorer to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LouisLeNezet commented 9 months ago

@vjcitn Is it good to go ?

bioc-issue-bot commented 9 months ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

lshep commented 9 months ago

The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/LouisLeNezet.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("Pedixplorer"). The package 'landing page' will be created at

https://bioconductor.org/packages/Pedixplorer

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.