Closed LouisLeNezet closed 9 months ago
Hi @LouisLeNezet
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: Pedixplorer
Version: 0.99.0
Date: 2023-09-19
Title: Pedigree Functions
Authors@R: c(
person("Louis", "Le Nézet", email="louislenezet@gmail.com",
role=c("aut","cre"), comment = c(ORCID = "0009-0000-0202-2703")),
person("Jason", "Sinnwell", email="sinnwell.jason@mayo.edu", role="aut"),
person("Terry", "Therneau", role="aut"),
person("Daniel", "Schaid", role="ctb"),
person("Elizabeth", "Atkinson", role="ctb"),
person("Louis", "Le Nezet", role='ctb'))
Depends:
R (>= 4.3.0)
Imports:
graphics,
stats,
methods,
ggplot2,
utils,
grDevices,
stringr,
plyr,
dplyr,
tidyr,
quadprog,
Matrix
Description: Routines to handle family data with a pedigree object. The initial purpose
was to create correlation structures that describe family relationships such
as kinship and identity-by-descent, which can be used to model family data
in mixed effects models, such as in the coxme function. Also includes a tool
for pedigree drawing which is focused on producing compact layouts without
intervention. Recent additions include utilities to trim the pedigree object
with various criteria, and kinship for the X chromosome.
License: Artistic-2.0
Encoding: UTF-8
RoxygenNote: 7.2.3
Roxygen: list(markdown = TRUE)
VignetteBuilder: knitr
Suggests:
testthat (>= 3.0.0),
diffviewer,
vdiffr,
rmarkdown,
BiocStyle,
knitr,
withr
Config/testthat/edition: 3
biocViews: Software, DataRepresentation, Genetics, Alignment
BugReports: https://github.com/LouisLeLezet/Pedixplorer/issues
url: https://github.com/LouisLeNezet/Pedixplorer
BiocType: Software
Collate:
'Pedixplorer-package.R'
'alignped4.R'
'alignped3.R'
'alignped2.R'
'alignped1.R'
'pedigree.R'
'validity.R'
'pedigreeClass.R'
'check_hints.R'
'kindepth.R'
'auto_hint.R'
'align.R'
'best_hint.R'
'bit_size.R'
'data.R'
'descendants.R'
'make_famid.R'
'family_check.R'
'kinship.R'
'utils.R'
'find_unavailable.R'
'find_avail_affected.R'
'find_avail_noninform.R'
'fix_parents.R'
'generate_aff_inds.R'
'generate_colors.R'
'ibd_matrix.R'
'is_informative.R'
'min_dist_inf.R'
'norm_data.R'
'num_child.R'
'ped_to_legdf.R'
'ped_to_plotdf.R'
'plot_fct.R'
'plot_fromdf.R'
'plot.R'
'shrink.R'
'trim.R'
'unrelated.R'
'useful_inds.R'
Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Build System.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Bioconductor Build System: macOS 12.6.5 Monterey: Pedixplorer_0.99.0.tar.gz Linux (Ubuntu 22.04.2 LTS): Pedixplorer_0.99.0.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Pedixplorer
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.
Thanks Louis for the submission.
As mentioned in the community Slack, the [[
and $
methods for Pedigree objects are just aliases for slot()
, which means that using them is not really different from using direct slot access. This is not the Bioconductor way for S4 object manipulation. As I explained in the community Slack, Bioconductor has a strong preference for dedicated accessor functions. This approach provides many advantages over direct slot access, which is why there's a strong consensus about it among the Bioconductor community. As a consequence, this is also what the Bioconductor users are used to, and what they are confortable with.
About subsetting: why does this work
library(Pedixplorer)
data("sampleped")
ped <- pedigree(sampleped)
ped[ped$ped$family == "1"]
but not this?
ped[ped$ped$gender == 1] # error!
(BTW why is the vignette using the 2D subsetting syntax for this (e.g. ped[ped$ped$family == "1", ]
)?)
Even subsetting based on family appartenance doesn't seem to work reliably:
ped$ped$family[1:5] <- "3"
ped[ped$ped$family == "3"] # error!
Also the way that subsetting works strongly suggests that Pedigree objects have a vector semantic, where the vector elements are the individuals. So a natural thing to do is to define a length()
method that returns the length of the vector (i.e. the number of individuals in it). Right now length()
returns 1 on these objects, which is confusing.
These are major issues with the current implementation of the Pedigree class. I understand that adressing them might require some substantial refactoring, and that the next iteration of the class and its API that will result from this refactoring might be quite different. So I'll wait before digging more deeply into this and provide more extensive feedback.
Just a couple of other (minor) things that caught my attention:
The name of the class is Pedigree (with P upper case) so any reference to the class should be Pedigree and not pedigree (R is case-sensitive). For example the vignettes and man pages have many occurences of "pedigree object" and "pedigree objects". This should be replaced with "Pedigree object" and "Pedigree objects".
The Bioconductor convention is to name the constructor function like the class itself so it should be Pedigree()
instead of pedigree()
.
The package contains 5 vignettes but it's not clear which one the user should start with:
Also, unless there's a really good reason for not doing it, title
and \VignetteIndexEntry
should preferrably be the same. For example it's hard to guess that the Pedigree_Examples
vignette listed above is actually mapped to the vignette with title "Pedixplorer Pedigree object". I would also recommend against using the underscore (_
) in the title
or \VignetteIndexEntry
fields.
So to summarize: one of the 5 vignettes should be made the entry point to the documentation, and its title should reflect that. Also it would be nice if this entry point vignette could give a quick overview of what the other vignettes are about, and maybe suggest an order of reading if there's one.
Table of Contents section in pedigree.Rmd
is incomplete but is unnecessary anyway because the Contents section is already automatically rendered in the HTML version.
Thanks, H.
Hi @hpages,
Thanks a lot for the explanation.
I will try to answer to each point:
ped[ped$ped$gender == 1] # error!
It doesn't work because if you select only the male in the pedigree object all the mother would not be present while still being in the momid column. What could be done would be do subselect the ped_df and then use fix_parents() ?
ped$ped$family[1:5] <- "3"
ped[ped$ped$family == "3"] # error!
It's the same problem as above, when you subset like that the individuals 1_135 & 1_136 are parents of the 3rd individuals but are not present in the subset. What would work would be:
ped$ped$family[c(1:5, 35, 36)] <- "3"
ped[ped$ped$family == "3"] # no error
Concerning the setter and getter should I do something like the following:
#' @title Pedigree ped accessors
#' @param object A Pedigree object.
#' @return The slot `ped` present in the Pedigree object.
#' @rdname extract-methods
#' @aliases ped,Pedigree-method
#' @export
setGeneric("ped", function(object){
standardGeneric("ped")
})
setMethod("ped", signature(object = "Pedigree"), function(object) {
object@ped
})
setGeneric("ped<-", function(object, value) {
standardGeneric("ped<-")
})
setMethod("ped<-", signature(object = "Pedigree", value = "ANY"), function(object, value) {
object@ped <- value
validObject(object)
object
})
By doing so, the following works:
testthat("Pedigree setters", {
data("sampleped")
ped <- Pedigree(sampleped)
peddf <- ped(ped)
peddf[1, "family"] <- "2"
ped(ped) <- peddf
expect_equal(ped(ped)[1, "family"], "2")
})
Received a valid push on git.bioconductor.org; starting a build for commit id: 460e819c47cac935a395c0818f37700bc36e79ae
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Build System.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Bioconductor Build System: ERROR before build products produced.
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Pedixplorer
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
ped$ped$family[1:5] <- "3" ped[ped$ped$family == "3"] # error!
It's the same problem as above, when you subset like that the individuals 1_135 & 1_136 are parents of the 3rd individuals but are not present in the subset.
Sure, I broke the family trees when I did ped$ped$family[1:5] <- "3"
. I did it on purpose. My point is that operations that break the object should be blocked and display a clear error message.
A major difficulty with the current design of the class and its API is that it's very hard for the user to know what modifications of the object are allowed. Providing a well-defined set of getters/setters should help with this.
For example, right now all columns in the data.frame stored in the ped
slot have the same level of "importance", at least in appearance. However, it seems that there are at least 5 groups:
id
, dadid
, momid
, sex
. Let's call them essential columns.indId
, fatherId
, motherId
, gender
. Not clear why that is needed.NA
s: steril
, avail
, error
, affected
, status
, sterilisation
, available
, family
, vitalStatus
, affection
, and affection_aff
.
However it seems that the user can supply some of them (e.g. family
and available
) but not all of them (e.g. avail
). What happens to these columns when the user supplies them to the pedigree()
constructor seems to vary greatly: some are taken as-is, some are silently modified (e.g. status
), and some (e.g. sterilisation
) seem to confuse the pedigree()
constructor so much that it will return a NULL
with an obscure warning message!mcols()
getter and setter.This means all these columns cannot be treated the same way. Does it make sense to expose all of them? What columns is the user allowed to drop or modify? For example it seems that letting the user touch anything in the essential columns should not be allowed. Also if group 2 & 3 are automatically handled internally by high-level operations during the typical Pedigree workflow, then maybe the user shouldn't be allowed to directly modify them either. Maybe the user should only be allowed to modify the metadata columns?
Given these fundamental differences between the 4 groups of columns, wouldn't it make more sense to store them in separate data.frames? At least separate the matadata columns (user-controlled) from the other columns. Note that an established convention in Bioconductor is to return the medata columns in a DataFrame object.
Also note that the show()
method doesn't display much so the user needs to explicitely access the data.frame
in the ped
slot in order to "see" the data. However this approach doesn't make any distinction between the different groups of columns. This means that the user needs to remember about what column belongs to what group in order to not mess up the object. This puts an unreasonable cognitive burden on the user. Plus the initial order of the columns (supplied at construction time) is not respected (the columns seem to have been shuffled), which doesn't help either.
About subsetting: it seems that only subsetting that is guaranteed to not break the family trees should be allowed. This means that providing general subsetting via [
is not really appropriate. Maybe it would make more sense to provide a specialized subsetting operation like subsetByFamily()
that takes the object and a vector of family ids. Or are there other forms of subsetting that should be supported?
So to summarize you want to think hard about the API of your objects. A typical API consists of length()
+ a set of getters and setters + maybe some kind of subsetting operation + maybe c()
to combine various objects. The API should only allow operations that make sense, and block illegal operations with clear error messages.
Hope this helps,
H.
Hi, thanks for the review.
Subsetting
I wanted to enable the user to subset based on a boolean or a list of id, that's why I used [
. But I should authorise only on 1D.
The user is already prompted if the subsetting doesn't work, but I could add verbose to that:
ped$ped$family[1:5] <- "3"
ped[ped$ped$family == "3"] # invalid class "Pedigree" object: 1: Values 1_135 should be in 1_101, 1_102, 1_103, 1_104, 1_105 ....
User access
The user should be able to drop columns only from group 4 (meta) (only if not use in the scales
slot and should be able to modify columns from group 2 (input) and group 4 (meta). The group 2 store the value used to generate the one in group 1 (identity) before the normalisation process. If a modification is made on group 2, it should rerun the normalisation and update the whole object.
Separate columns Should I set the different data.frame into a list of data.frame or into separated slots ? I could also keep them all in one data.frame and create specific getters and setters.
setClass(
"Pedigree",
slots = c(
ped = list(
"id" = "data.frame",
"input" = "data.frame"
"derived" = "data.frame"
"meta" = "data.frame"
),
rel = "data.frame",
scales = "list",
hints = "list"
)
)
Group 3 (derived) are also meta columns but derived from the input columns.
Setters and getters Would it be possible to create a global getters and setters depending on the category of variable like
col_id <- function(ped, col = NULL) {
col_id <- c("id", "dadid", "momid", "sex")
if (is.null(col)) {
col <- col_id
} else if (!all(col %in% col_id)) {
stop(
"Col selected: ",
col[!col %in% col_id],
" is not an identity column"
)
}
ped(ped)[col]
}
show()
I didn't want the show()
method to be to heavy.
The summary()
function was more complete.
I could separately the printed information by the different type of columns for both show()
and summary()
.
What do you think ?
Received a valid push on git.bioconductor.org; starting a build for commit id: 1aa7197074c0f2ff73e7a6f758ade534c3a512c4
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Build System.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Bioconductor Build System: Linux (Ubuntu 22.04.2 LTS): Pedixplorer_0.99.2.tar.gz macOS 12.6.5 Monterey: Pedixplorer_0.99.2.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Pedixplorer
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi @LouisLeNezet,
There's a lot going on with object and API design. I'm well aware that this is not an easy topic. In particular, a major challenge is to come up with an API for your objects that supports the operations encountered in the typical workflows, in an intuitive way, that is agnostic of internal representation, that adheres to the principle of least surprise, that handles user errors in a helpful and user-friendly way etc... For example, as mentioned previously, this operation should fail with a useful error message:
data("sampleped")
ped <- Pedigree(sampleped)
ped$ped$family[1:5] <- "3"
and this one too:
descendants(c("1_101", "2_208", "3_111"), ped)
but they don't.
On top of this, when choosing S4 to implement objects and their API, there's the additonal burden of learning S4 itself. There's also the overhead of getting familiar with some other widely used objects in Bioconductor, like SummarizedExperiment, SingleCellExperiment, GInteractions etc... to get an idea of how things are usually done, and to fit in the ecosystem.
To make things even more challenging, your Pedigree objects are quite complicated, and I'm not familiar with the topic so can only provide limited guidance in the form of general statements. Furthermore, I don't know the history of the old pedigree objects from the kinship2 package, or what their shortcomings are that motivated the decision to replace the old class with this new class in the first place. Providing more concrete guidance would require that I invest a significant amount of time. Ultimately the package review is not the right venue for this level of involvement.
All this to say that I will now stop commenting on the design of Pedigree objects and their API. Hopefully they are already an improvement over the old pedigree objects. And getting them in Bioconductor will hopefully give them the kind of exposure that will help gathering feedback and refine them in the future (even though incremental refinements won't be easy if too much of their internal representation is exposed to the user, which is exactly why separating the API from internal representation is so important).
Here is some additional feedback for the rest of the package, mostly cosmetic:
vignette()
commands provided in the Introduction of the tutorial don't workvignette()
commands don't work either{}
in the tutorial: "The {} object can .." and "We use {} to calculate" What is {}
?That will be all.
Thanks again for your work on this, H.
Received a valid push on git.bioconductor.org; starting a build for commit id: d4709098ce684dfef269877b592755aec7949862
Hello, we are in the middle of updating the SPB for the current release. when I am done with the updates I will manually kick off a new build report for you. sorry for the inconvenience
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Build System.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Bioconductor Build System: Linux (Ubuntu 22.04.2 LTS): Pedixplorer_0.99.3.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Pedixplorer
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 9dfad8318b170f9457ca02a8808f1c9efe2c70d2
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.2 LTS): Pedixplorer_0.99.4.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Pedixplorer
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: a3c80e34fd4d9c5e15947f2523fba449ce17820f
Hi @hpages,
Thanks a lot for your feedback, it helped me a lot to redesign the API. It took me some time, but I should have now something more close to the bioconductor standard, I hope.
The biggest change is on the structure of the S4 object. The Pedigree object contains now the following object:
I've created the a lot of dedicated accessors but I'm not sure that is what I should do when there is so many slots.
Maybe a wrapper like ped(x, slot)
should be better ?
Also for the Pedigree object the famid() accessor is a wrapper of the famid(ped(x))
and this is the same for the mcols
as the usage of this two accessors should reflect the individuals of the Pedigree, from my point of view.
I've added more use case in the vignettes, but that might be to little (when you want to go in details), and maybe too much for the main vignette. Should I separate it more ?
Thanks for your review !
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.2 LTS): Pedixplorer_0.99.5.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Pedixplorer
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
I'm very sorry for the delay; it should not be taking this long. I will check in with @hpages if he plans to revisit this review soon or reassign the reviewer if necessary
Hi @LouisLeNezet, sorry for the delay. I'll take another look at this in the next few days. Thanks for your patience.
I am looking at it now.
I think most of the concerns have been addressed well. I would say that the current appearance of the vignettes is suboptimal because none is explicitly identified as the 'tutorial' or 'introduction' that should be visited first by a newcomer.
Please use the internal title field to help visitors navigate. Once this is done I will accept the package.
Received a valid push on git.bioconductor.org; starting a build for commit id: d40e20aaafcc7524261e8f903680fa1cdcba22a6
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.2 LTS): Pedixplorer_0.99.6.tar.gz macOS 12.7.1 Monterey: Pedixplorer_0.99.6.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Pedixplorer
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
@LouisLeNezet will you be able to deal with the suggestion above?
Hi, It should have been done in the last push. Is it not the case ? The new title for the "pedigree.rmd" is "Pedigree tutorial" as asked.
Received a valid push on git.bioconductor.org; starting a build for commit id: f6225502a70f7da1b9a04d73df32db77d43c0d7c
Sorry I forgot to modify the VignetteIndexEntry{}
This is done now.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): Pedixplorer_0.99.7.tar.gz macOS 12.7.1 Monterey: Pedixplorer_0.99.7.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Pedixplorer
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
@vjcitn Is it good to go ?
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.
The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/LouisLeNezet.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("Pedixplorer")
. The package 'landing page' will be created at
https://bioconductor.org/packages/Pedixplorer
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.