Closed jeffreypullin closed 2 years ago
Hi @jeffreypullin
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Type: Package
Package: cardelino
Title: Clone Identification from Single Cell Data
Version: 0.99.0
Authors@R:
c(person(given = "Jeffrey",
family = "Pullin",
role = c("aut", "cre"),
email = "jpullin@svi.edu.au"),
person(given = "Yuanhua",
family = "Huang",
role = "aut",
email = "yuanhua@hku.hk"),
person(given = "Davis",
family = "McCarthy",
role = "aut",
email = "dmccarthy@svi.edu.au"))
Description: Methods to infer clonal tree configuration for a population of
cells using single-cell RNA-seq data (scRNA-seq), and possibly other data
modalities. Methods are also provided to assign cells to inferred clones
and explore differences in gene expression between clones. These methods
can flexibly integrate information from imperfect clonal trees inferred
based on bulk exome-seq data, and sparse variant alleles expressed in
scRNA-seq data. A flexible beta-binomial error model that accounts for
stochastic dropout events as well as systematic allelic imbalance is used.
License: GPL (>= 3)
URL: https://github.com/single-cell-genetics/cardelino
BugReports: https://github.com/single-cell-genetics/cardelino/issues
Depends:
R (>= 4.2),
stats
Imports:
GenomeInfoDb,
GenomicRanges,
ggplot2,
ggtree,
Matrix,
matrixStats,
methods,
pheatmap,
snpStats,
S4Vectors,
utils,
VariantAnnotation,
vcfR
Suggests:
BiocStyle,
combinat,
foreach,
knitr,
pcaMethods,
rmarkdown,
testthat,
VGAM
Enhances:
doMC
VignetteBuilder:
knitr
biocViews: SingleCell, RNASeq, Visualization, Transcriptomics,
GeneExpression, Sequencing, Software, ExomeSeq
Encoding: UTF-8
NeedsCompilation: yes
RoxygenNote: 7.2.0
Please use BiocCheck on your package. I got
* Checking License: for restrictive use...
* NOTE: License 'GPL (>= 3)' unknown; licenses cannot restrict use
among other concerns; see information on the developer page concerning exact notation accepted for license fields.
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/cardelino
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: f6b53f7912f74933556cfaee08d347fdf4b0e6db
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/cardelino
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi @jeffreypullin,
Thanks for submitting cardelino
, it'll take me roughly 10 days to look through the package or so. It may be worth looking at the build report to see if you can tackle any of those outstanding issues. Let me know if you have any immenent deadlines, as I may be able to turn the review around faster in some cases.
Best wishes, Olly
Hi @ococrook,
No worries, there are no imminent deadlines. Indeed, I will be on leave for most of August, so probably won't be able to make changes until early September.
Best, Jeffrey
This looks really good, thank you for the contribution. I found it well documented and easy to follow. The R code has a consistent style and is easy to read and follow. I have some feedback that it would be useful to incorporate before I can recommend accepting the package. Please respond with a clarification that you completed the follow or a description of why you didn't incoperate the feedback.
[ ] The versions of the package do not match in the Description
and NEWS.md
, could this be fixed otherwise it’s a bit confusing?
[ ] Can you fix the license? It’s unclear what you mean by a variable (>= 3) license.
[ ] Could you add a contribution and issue reporting guidelines? We actively encourage users of Bioconductor packages to contact the authors if they have issues or suggested improvements. A welcoming message in the Readme can go a long way.
[ ] If I understand correctly, there is a fairly complex Bayesian model underlying the software. Could the authors point towards the reference for the details? I assume you also want users to cite your work ;)
[ ] It’s not clear what the input data is going to be before installing the package. It would be good to add a short section about the sort of experimental inputs that are expected.
There are some spelling issues, though minor, it would be good to clear these up e.g.: “gorvened”, “whcih”, there are some packages that can check spelling for you.
[ ] There are some sapply()
, could these be replace with vapply()
[ ] There a lots of for loops, I know it’s not a always possible but if any can be converted to the appropriate *apply that would be great.
clone_id
. Checks can be added to avoid cryptic errors.[ ] I would have expected clone_id
to take an S4 object as input that checks that it is of the right style. This isn’t my area but I would expect relevant objects to already exist in Bioconductor, otherwise you can make your own.
[ ] clone_id
should also return an S4 object, returning a list from your main function is likely to lead to issues very quickly
[ ] You can then write methods that only take this object for clarity
[ ] You suggest a higher number of interactions in practice, maybe it's good to add a warning to function if the number of input iterations are too small?
[ ] How in general do you check the method has converged - I didn’t see any guidance here.
Received a valid push on git.bioconductor.org; starting a build for commit id: d2196b4870d8e291164794fbc569ff4002b6a1ac
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/cardelino
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Cardelino
This looks really good, thank you for the contribution. I found it well documented and easy to follow. The R code has a consistent style and is easy to read and follow. I have some feedback that it would be useful to incorporate before I can recommend accepting the package. Please respond with a clarification that you completed the follow or a description of why you didn't incoperate the feedback.
Description File
* [X] The versions of the package do not match in the `Description` and `NEWS.md`, could this be fixed otherwise it’s a bit confusing?
Fixed, thanks
* [X] Can you fix the license? It’s unclear what you mean by a variable (>= 3) license.
Fixed, changes to GPL-3.
Readme File
* [ ] Could you add a contribution and issue reporting guidelines? We actively encourage users of Bioconductor packages to contact the authors if they have issues or suggested improvements. A welcoming message in the Readme can go a long way.
Done.
* [X] If I understand correctly, there is a fairly complex Bayesian model underlying the software. Could the authors point towards the reference for the details? I assume you also want users to cite your work ;)
Done.
* [X] It’s not clear what the input data is going to be before installing the package. It would be good to add a short section about the sort of experimental inputs that are expected.
Done.
Inst directory
* [X] From experience these directories can get quite big and easy to forget what’s in them after sometime. I suggest adding a readme to this directory that details what is in this folder and to keep this update if anything is added here.
Done, thanks for the idea.
Data
* [ ] Is simulation_input documented?
No, simulation_input
is really just a convenience wrapper for use in package documentation so I don't think it needs to be documented.
Spelling
There are some spelling issues, though minor, it would be good to clear these up e.g.: “gorvened”, “whcih”, there are some packages that can check spelling for you.
Done
R code
* [X] There are some `sapply()`, could these be replace with `vapply()`
Done
* [ ] There a lots of for loops, I know it’s not a always possible but if any can be converted to the appropriate *apply that would be great.
I worked through the loops in the package and feel that most if not all of them are appropriate in context.
* [ ] In general, the function arguments need to be checked more carefully. What happens if I just pass a matrix to `clone_id`. Checks can be added to avoid cryptic errors.
I cannot reproduce this behavior, can you be more specific? I feel that the current argument validation is adequate.
S4
* [ ] I would have expected `clone_id` to take an S4 object as input that checks that it is of the right style. This isn’t my area but I would expect relevant objects to already exist in Bioconductor, otherwise you can make your own. * [ ] `clone_id` should also return an S4 object, returning a list from your main function is likely to lead to issues very quickly * [ ] You can then write methods that only take this object for clarity
I am hesitant to convert the package to use more S4. To give some context: cardelino was built by Dr Davis McCarthy and Dr Yuanhua Huang from 2018-2020 when the paper describing the software was published. I am a (pre-PhD) research assistant currently working for Davis McCarthy tasked with maintaining the package and submitting it to Bioconductor. We wanted to submit cardelino to Bioconductor now to follow best practices in the community and to make life easier for anyone currently using or benchmarking it. The McCarthy lab is currently working on new methodology that we expect will extend and supersede, which will be published in a separate package. Practically, we are not aware of any Bioconductor data structures that support the type of data the package requires as input.
Vignette
* [X] You suggest a higher number of interactions in practice, maybe it's good to add a warning to function if the number of input iterations are too small? * [X] How in general do you check the method has converged - I didn’t see any guidance here.
To address both of these comments I have added extra convergence validation to the clone_id()
function and added a section mentioning that this validation occurs in the vignette.
Hi @ococrook, just wanted to follow up on this :).
Hi @jeffreypullin, just got back from annual leave, I'm going to start going through my reviews today
Thanks @jeffreypullin for the revisions and looks like the package is almost there. There are a couple of points I'm still unsure about
-[ ] Checking function inputs more carefully.
For clone_id
, your input is a sparse matrix but then in the documentation you ask for an integer matrix. It seems the sparsness might not be that important but the integer entries are. If I add 0.01 to your input my R session just crashes. It's quite an easy mistake for a user to make and it would be quite annoying for them when a simple checks can be implemented.
The package would really benefit from the data being stored in a biocondutor object that is carefully checked for validity. See here: https://contributions.bioconductor.org/important-bioconductor-package-development-features.html, or you may want to look at the SingleCellExperiment
object. We don't usually accept packages that don't make use of these dedicated structures. If there really is no appropriate class structure this is something that will need implementing or further dicussion.
CardelinoRes
object so that everything is stored nicely (quantitative results, convergence checks etc).Hi @ococrook,
Thanks for your response :)
-[X] Checking function inputs more carefully.
I have added a new check that the sparse input A and D matrices have integer values. I could not, however, reproduce the crashing the behavior you report when the entries were not integers.
I agree that ideally the package would store data in existing Bioconductor objects, indeed the consistent use of data objects is one of the best aspects of Bioconductor! However, having reviewed existing data structures I am not sure that any are appropriate.
For storing the required input matrices the most relevant object appears to be the SingleCellExperiment
object. However, the input matrices in {cardelino} encode allele read counts not RNA-seq counts, as is typically stored in the SCE object. I am therefore concerned that using the SingleCellExperiment
object would both give the false impression that {cardelino} requires expression data and confuse users as the usual counts()
etc. functions would not be applicable.
While implementing a custom class to hold the matrices would be possible, I believe it may just complicate the package without greatly increasing its utility.
Do you have any suggestions for other S4 classes we could use?
Thanks for your time!
Best, Jeffrey
Hi @jeffreypullin
Just discussing with the core team on this point and I'll get back to you soon hopefully!
Sorry to get back to you slowly, coul you have a look at the: https://bioconductor.org/packages/release/bioc/html/VariantExperiment.html object to see if that would work?
If not we can avoid using a dedicated class in this case. If you are actively planning an updated model/package, I would strongly consider thinking about generating a suitable bioconductor class for these data.
Received a valid push on git.bioconductor.org; starting a build for commit id: db38a8af8dab8415ad7a9a4a5ae75b30f99d48a4
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/cardelino
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi @ococrook,
Thanks for your continued help and sorry for the complications!
After careful thought we have decided to not to use the VariantExperiment
object in the package, for three main reasons:
VariantExperiment
to successfully load the VCF files included in cardelino
due to the files not meeting implicit size consistency assumptions embedded in VariantExperiment
;cardelino
and MQuad
, a tool designed to provide output to cardelino, challenging; andVariantExperiment
object is not designed for single-cell data. As discussed our group is actively planning a model and package which should supersede cardelino
, which will definitely include S4 input and output objects.
The most recent push to the Biconductor servers above just includes some final tweaks to cardelino
.
Thanks again for your help!
Best, Jeffrey
Hi @jeffreypullin
Thanks for having a look - that makes sense to me and thanks for investigating. I'm happy to mark for acceptance. Let me know when submit the new model and you can tag me as a reviewer because familiarity will move things forward more quickly.
Best wishes, Olly
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/jeffreypullin.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("cardelino")
. The package 'landing page' will be created at
https://bioconductor.org/packages/cardelino
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.