Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

cardelino #2710

Closed jeffreypullin closed 2 years ago

jeffreypullin commented 2 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 years ago

Hi @jeffreypullin

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Type: Package
Package: cardelino
Title: Clone Identification from Single Cell Data
Version: 0.99.0
Authors@R: 
    c(person(given = "Jeffrey", 
   family = "Pullin",
   role = c("aut", "cre"),
   email = "jpullin@svi.edu.au"),
      person(given = "Yuanhua",
   family = "Huang",
   role = "aut",
   email = "yuanhua@hku.hk"),
      person(given = "Davis",
   family = "McCarthy",
   role = "aut",
   email = "dmccarthy@svi.edu.au"))
Description: Methods to infer clonal tree configuration for a population of
    cells using single-cell RNA-seq data (scRNA-seq), and possibly other data
    modalities. Methods are also provided to assign cells to inferred clones
    and explore differences in gene expression between clones. These methods
    can flexibly integrate information from imperfect clonal trees inferred
    based on bulk exome-seq data, and sparse variant alleles expressed in
    scRNA-seq data. A flexible beta-binomial error model that accounts for
    stochastic dropout events as well as systematic allelic imbalance is used.
License: GPL (>= 3)
URL: https://github.com/single-cell-genetics/cardelino
BugReports: https://github.com/single-cell-genetics/cardelino/issues
Depends: 
    R (>= 4.2),
    stats
Imports: 
    GenomeInfoDb,
    GenomicRanges,
    ggplot2,
    ggtree,
    Matrix,
    matrixStats,
    methods,
    pheatmap,
    snpStats,
    S4Vectors,
    utils,
    VariantAnnotation,
    vcfR
Suggests: 
    BiocStyle,
    combinat,
    foreach,
    knitr,
    pcaMethods,
    rmarkdown,
    testthat,
    VGAM
Enhances:
    doMC
VignetteBuilder: 
    knitr
biocViews: SingleCell, RNASeq, Visualization, Transcriptomics,
    GeneExpression, Sequencing, Software, ExomeSeq
Encoding: UTF-8
NeedsCompilation: yes
RoxygenNote: 7.2.0
vjcitn commented 2 years ago

Please use BiocCheck on your package. I got

* Checking License: for restrictive use...
    * NOTE: License 'GPL (>= 3)' unknown; licenses cannot restrict use

among other concerns; see information on the developer page concerning exact notation accepted for license fields.

bioc-issue-bot commented 2 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/cardelino to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: f6b53f7912f74933556cfaee08d347fdf4b0e6db

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/cardelino to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

ococrook commented 2 years ago

Hi @jeffreypullin,

Thanks for submitting cardelino, it'll take me roughly 10 days to look through the package or so. It may be worth looking at the build report to see if you can tackle any of those outstanding issues. Let me know if you have any immenent deadlines, as I may be able to turn the review around faster in some cases.

Best wishes, Olly

jeffreypullin commented 2 years ago

Hi @ococrook,

No worries, there are no imminent deadlines. Indeed, I will be on leave for most of August, so probably won't be able to make changes until early September.

Best, Jeffrey

ococrook commented 2 years ago

Cardelino

This looks really good, thank you for the contribution. I found it well documented and easy to follow. The R code has a consistent style and is easy to read and follow. I have some feedback that it would be useful to incorporate before I can recommend accepting the package. Please respond with a clarification that you completed the follow or a description of why you didn't incoperate the feedback.

Description File

Readme File

Inst directory

Data

Spelling

There are some spelling issues, though minor, it would be good to clear these up e.g.: “gorvened”, “whcih”, there are some packages that can check spelling for you.

R code

S4

Vignette

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: d2196b4870d8e291164794fbc569ff4002b6a1ac

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/cardelino to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

jeffreypullin commented 2 years ago

Cardelino

This looks really good, thank you for the contribution. I found it well documented and easy to follow. The R code has a consistent style and is easy to read and follow. I have some feedback that it would be useful to incorporate before I can recommend accepting the package. Please respond with a clarification that you completed the follow or a description of why you didn't incoperate the feedback.

Description File

* [X]  The versions of the package do not match in the `Description` and `NEWS.md`, could this be fixed otherwise it’s a bit confusing?

Fixed, thanks

* [X]  Can you fix the license? It’s unclear what you mean by a variable (>= 3) license.

Fixed, changes to GPL-3.

Readme File

* [ ]  Could you add a contribution and issue reporting guidelines? We actively encourage users of Bioconductor packages to contact the authors if they have issues or suggested improvements. A welcoming message in the Readme can go a long way.

Done.

* [X]  If I understand correctly, there is a fairly complex Bayesian model underlying the software. Could the authors point towards the reference for the details? I assume you also want users to cite your work ;)

Done.

* [X]  It’s not clear what the input data is going to be before installing the package. It would be good to add a short section about the sort of experimental inputs that are expected.

Done.

Inst directory

* [X]  From experience these directories can get quite big and easy to forget what’s in them after sometime. I suggest adding a readme to this directory that details what is in this folder and to keep this update if anything is added here.

Done, thanks for the idea.

Data

* [ ]  Is simulation_input documented?

No, simulation_input is really just a convenience wrapper for use in package documentation so I don't think it needs to be documented.

Spelling

There are some spelling issues, though minor, it would be good to clear these up e.g.: “gorvened”, “whcih”, there are some packages that can check spelling for you.

Done

R code

* [X]  There are some `sapply()`, could these be replace with `vapply()`

Done

* [ ]  There a lots of for loops, I know it’s not a always possible but if any can be converted to the appropriate *apply that would be great.

I worked through the loops in the package and feel that most if not all of them are appropriate in context.

* [ ]  In general, the function arguments need to be checked more carefully. What happens if I just pass a matrix to `clone_id`. Checks can be added to avoid cryptic errors.

I cannot reproduce this behavior, can you be more specific? I feel that the current argument validation is adequate.

S4

* [ ]  I would have expected `clone_id` to take an S4 object as input that checks that it is of the right style. This isn’t my area but I would expect relevant objects to already exist in Bioconductor, otherwise you can make your own.

* [ ]  `clone_id` should also return an S4 object, returning a list from your main function is likely to lead to issues very quickly

* [ ]  You can then write methods that only take this object for clarity

I am hesitant to convert the package to use more S4. To give some context: cardelino was built by Dr Davis McCarthy and Dr Yuanhua Huang from 2018-2020 when the paper describing the software was published. I am a (pre-PhD) research assistant currently working for Davis McCarthy tasked with maintaining the package and submitting it to Bioconductor. We wanted to submit cardelino to Bioconductor now to follow best practices in the community and to make life easier for anyone currently using or benchmarking it. The McCarthy lab is currently working on new methodology that we expect will extend and supersede, which will be published in a separate package. Practically, we are not aware of any Bioconductor data structures that support the type of data the package requires as input.

Vignette

* [X]  You suggest a higher number of interactions in practice, maybe it's good to add a warning to function if the number of input iterations are too small?

* [X]  How in general do you check the method has converged - I didn’t see any guidance here.

To address both of these comments I have added extra convergence validation to the clone_id() function and added a section mentioning that this validation occurs in the vignette.

jeffreypullin commented 2 years ago

Hi @ococrook, just wanted to follow up on this :).

ococrook commented 2 years ago

Hi @jeffreypullin, just got back from annual leave, I'm going to start going through my reviews today

ococrook commented 2 years ago

Thanks @jeffreypullin for the revisions and looks like the package is almost there. There are a couple of points I'm still unsure about

-[ ] Checking function inputs more carefully.

For clone_id, your input is a sparse matrix but then in the documentation you ask for an integer matrix. It seems the sparsness might not be that important but the integer entries are. If I add 0.01 to your input my R session just crashes. It's quite an easy mistake for a user to make and it would be quite annoying for them when a simple checks can be implemented.

interop

The package would really benefit from the data being stored in a biocondutor object that is carefully checked for validity. See here: https://contributions.bioconductor.org/important-bioconductor-package-development-features.html, or you may want to look at the SingleCellExperiment object. We don't usually accept packages that don't make use of these dedicated structures. If there really is no appropriate class structure this is something that will need implementing or further dicussion.

jeffreypullin commented 2 years ago

Hi @ococrook,

Thanks for your response :)

-[X] Checking function inputs more carefully.

I have added a new check that the sparse input A and D matrices have integer values. I could not, however, reproduce the crashing the behavior you report when the entries were not integers.

I agree that ideally the package would store data in existing Bioconductor objects, indeed the consistent use of data objects is one of the best aspects of Bioconductor! However, having reviewed existing data structures I am not sure that any are appropriate.

For storing the required input matrices the most relevant object appears to be the SingleCellExperiment object. However, the input matrices in {cardelino} encode allele read counts not RNA-seq counts, as is typically stored in the SCE object. I am therefore concerned that using the SingleCellExperiment object would both give the false impression that {cardelino} requires expression data and confuse users as the usual counts() etc. functions would not be applicable.

While implementing a custom class to hold the matrices would be possible, I believe it may just complicate the package without greatly increasing its utility.

Do you have any suggestions for other S4 classes we could use?

Thanks for your time!

Best, Jeffrey

ococrook commented 2 years ago

Hi @jeffreypullin

Just discussing with the core team on this point and I'll get back to you soon hopefully!

ococrook commented 2 years ago

Sorry to get back to you slowly, coul you have a look at the: https://bioconductor.org/packages/release/bioc/html/VariantExperiment.html object to see if that would work?

If not we can avoid using a dedicated class in this case. If you are actively planning an updated model/package, I would strongly consider thinking about generating a suitable bioconductor class for these data.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: db38a8af8dab8415ad7a9a4a5ae75b30f99d48a4

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/cardelino to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

jeffreypullin commented 2 years ago

Hi @ococrook,

Thanks for your continued help and sorry for the complications!

After careful thought we have decided to not to use the VariantExperiment object in the package, for three main reasons:

  1. We could not get VariantExperiment to successfully load the VCF files included in cardelino due to the files not meeting implicit size consistency assumptions embedded in VariantExperiment;
  2. using the object would make integration of cardelino and MQuad, a tool designed to provide output to cardelino, challenging; and
  3. the VariantExperiment object is not designed for single-cell data.

As discussed our group is actively planning a model and package which should supersede cardelino, which will definitely include S4 input and output objects.

The most recent push to the Biconductor servers above just includes some final tweaks to cardelino.

Thanks again for your help!

Best, Jeffrey

ococrook commented 2 years ago

Hi @jeffreypullin

Thanks for having a look - that makes sense to me and thanks for investigating. I'm happy to mark for acceptance. Let me know when submit the new model and you can tag me as a reviewer because familiarity will move things forward more quickly.

Best wishes, Olly

bioc-issue-bot commented 2 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

lshep commented 2 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/jeffreypullin.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("cardelino"). The package 'landing page' will be created at

https://bioconductor.org/packages/cardelino

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.