Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

APL package submission #2465

Closed VingronLab closed 2 years ago

VingronLab commented 2 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 years ago

Hi @VingronLab

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: APL
Type: Package
Title: Association Plots
Version: 0.99.0
Authors@R:
  c(person(given = "Elzbieta",
 family = "Gralinska",
 role = c("cre", "aut"),
 email = "gralinska@molgen.mpg.de"),
    person(given = "Clemens",
 family = "Kohl",
 role = c("aut"),
 email = "kohl@molgen.mpg.de"),
     person(given = "Martin",
  family = "Vingron",
  role = c("aut"),
  email = "vingron@molgen.mpg.de"))
Description: APL is a package developed for computation of Association Plots
  (AP), a method for visualization and analysis of single cell transcriptomics 
  data. The main focus of APL is the identification of genes characteristic for
  individual clusters of cells from input data. The package performs 
  correspondence analysis (CA) and allows to identify cluster-specific
  genes using Association Plots. Additionally, APL computes the 
  cluster-specificity scores for all genes which allows to rank the genes by 
  their specificity for a selected cell cluster of interest.
biocViews: 
  StatisticalMethod,
  DimensionReduction,
  SingleCell,
  Sequencing,
  RNASeq,
  GeneExpression
License: GPL (>= 3)
Encoding: UTF-8
RoxygenNote: 7.1.2
VignetteBuilder: knitr
Imports: 
    reticulate,
    ggrepel,
    ggplot2,
    viridisLite,
    plotly,
    Seurat,
    SingleCellExperiment,
    magrittr,
    SummarizedExperiment,
    topGO,
    methods,
    stats,
    utils,
    org.Hs.eg.db,
    org.Mm.eg.db,
    rlang
Depends: R (>= 3.5)
Suggests: 
    BiocStyle,
    knitr,
    rmarkdown,
    TENxPBMCData,
    testthat
Config/testthat/edition: 3
Collate: 
    'constructor.R'
    'CA.R'
    'apl.R'
    'convert.R'
    'generic_methods.R'
    'import_packages.R'
    'plot.R'
    'utils-pipe.R'
SystemRequirements: python, pytorch, numpy
vjcitn commented 2 years ago

I am doing early check on this package (sorry for long time lag) and I think there is a risk that this package will time out. The call to runAPL seems to take quite some time. Is there a way to avoid this?

vjcitn commented 2 years ago

I continue to explore this package. The use of python is noted in the DESCRIPTION file. I would strongly recommend you consider the use of the basilisk package in Bioconductor to manage the python dependencies. Bioconductor package BiocSklearn illustrates one approach to exposing pieces of scikit-learn.

VingronLab commented 2 years ago

Thanks a lot for your feedback and for the suggestion to use basilisk. We will try to replace reticulate by basilisk.

Regarding the function runAPL, when working with the PBMC data described in the vignette one could use the parameters: "pd_use = FALSE" and "dims = 222". This would skip the calculation of the dimension number, and use a fixed number of dimensions instead. In addition to this (or alternatively), one can use the parameter "reps = 1", which will also speed up the calculations. Finally, one can also turn off the calculation of the gene scores by setting "scores = FALSE". This will lead to generating an Association Plot without the gene scores.

vjcitn commented 2 years ago

Thanks for getting back to me. Please make the relevant changes to the vignette so that this package can be built and checked in the shortest time possible. If you need longer times for testing, there is a longTests capability that we can discuss as needed: https://www.bioconductor.org/developers/how-to/long-tests/

VingronLab commented 2 years ago

Thanks a lot for the reply. We've just updated the vignette and now it takes less time than before. We've also timed the R CMD Check and we wonder if the limit of 40 min relates to the "real" or "user" time. Or it's not related to any of them? Unfortunately, we couldn't find any information.

vjcitn commented 2 years ago

For submissions the limit is 5 minutes "real" time. We will ensure that this is clarified in developer doc. @lshep

federicomarini commented 2 years ago

I might be still familiar with this package, I recall a very nice presentation by @elagralinska! Would it be beneficial to pack the python&co dependencies into a basilisk framework to simplify installation/ensure everything works?

VingronLab commented 2 years ago

Thank you, @federicomarini! Yes, we hope so. We are currently working on this and we plan to use the basilisk framework instead of reticulate. Thanks a lot for the suggestion.

federicomarini commented 2 years ago

Since I had to go a similar way for another package, I found it very useful to check how that was done in the packages that import basilisk itself, on top of the basilisk vignette. HTH 😉 Looking forward to seeing this in the Bioc ecosystem!

vjcitn commented 2 years ago

The R version in DESCRIPTION must be >= 4.2 for a devel submission. I am approving for review, but I hope you will let us know if basilisk will be used, which I feel should be obligatory. If it will, we should wait to begin the review.

vjcitn commented 2 years ago
 Using 222 dimensions.
  |=======================                                               |  33%

the vignette construction is taking too long. Can you speed it up and/or use the longtests discipline?

VingronLab commented 2 years ago

Thanks a lot. Yes, we will use basilisk. We will notify you here once we are done with this. Regarding the vignette, yes - we can do one of these two things. We will discuss now what is better to do, and we will make the changes in the vignette.

bioc-issue-bot commented 2 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/APL to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

vjcitn commented 2 years ago

what kinds of timings are you getting for CMD check?

VingronLab commented 2 years ago

To decrease the time we have changed the data set which we use in the vignette, and now R CMD check takes:

real 7m30.138s user 8m54.135s sys 0m26.603s

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 17237618be91289c88197cd1f69fa5a739efbbed

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/APL to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

vjcitn commented 2 years ago

you have pytorch in SystemRequirements but the python pip element is called "torch" is it not? @jwokaty

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: b12d39d69effb21345f3e1a049543053a49e7c1c

jwokaty commented 2 years ago

Thanks for the mention. We don't currently have torch installed on the builders, but I can install it.

VingronLab commented 2 years ago

Thanks a lot for your replies. We've just set Python = FALSE in the vignette as we thought it will at least eliminate the error, but once torch is installed we can change it back to the previous version.

jwokaty commented 2 years ago

I've added torch to the builders by installing via pip. I think you'll need to do another push to trigger the build.

jwokaty commented 2 years ago

@VingronLab Are torchvision and torchaudio also needed?

VingronLab commented 2 years ago

@jwokaty We don't need them

VingronLab commented 2 years ago

I've added torch to the builders by installing via pip. I think you'll need to do another push to trigger the build.

We've just decided to keep the current version (with Python = FALSE in the vignette), so the build is up-to-date.

vjcitn commented 2 years ago

Shouldn't the vignette run at least some of the python code?

VingronLab commented 2 years ago

We are wondering if it's not better to turn off the python code in the vignette (and use our second alternative approach which is not based on python) in case a user doesn't have python installed? But we can turn it on again in the vignette.

vjcitn commented 2 years ago

On our build system the code should be exercised. Users without python will have to deal with it according to documentation.

VingronLab commented 2 years ago

Right, thanks! We will change it now.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 9f3a6417b99fc369dbda85d7bbfc5994d1dc064d

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/APL to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

hpages commented 2 years ago

Where are we standing w.r.t. the switch from reticulate to basilisk? Using the latter is a much more user-friendly approach because then people don't need to install anything (Python, pytorch, numpy, torchvision, torchaudio, etc...) basilisk automatically takes care of everything. This also means that we don't need to install anything on the build machines either.

There's a good reason why we strongly encourage the use of basilisk over reticulate.

Thanks, H.

VingronLab commented 2 years ago

Thanks a lot for the comment about basilisk. We are working on this and once everything is working, we will update the package.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: ba22831892ba3695727621db901d75cdc727f91c

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/APL to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

vjcitn commented 2 years ago

keep us posted

VingronLab commented 2 years ago

Hi, our last trigger was the most current version of the package (without Basilisk, though). It resulted in 1 error on the Linux platform, probably because 'torch' wasn't installed there yet. Regarding Basilisk, we run into conda installation problems and are trying to fix them. Would it be possible, in the worst case, to keep for now the version without Basilisk, and add Basilisk once we solve the conda issues?

VingronLab commented 2 years ago

1) We have added basilisk to our package, and the version with basilisk is now on the dev branch. It seems to work fine for small data sets, however when applied to bigger data the execution of the function cacomp() (a function which uses basilisk) never ends. Do you have maybe any suggestions what might be the reason for it and what did we do wrong?

2) It seems that conda doesn't use the libraries that pytorch comes with, but instead it uses shared libraries of the OS. This can be fixed but that would also mean that all process in the same session use the conda libraries. Is there a way how to deal with this?

vjcitn commented 2 years ago

I don't think I'll be able to address this before the release. You might ask for help debugging on bioc-devel. Can I interpret your remarks to mean that if reticulate is used on its own with a specific python configuration, the large data problem succeeds, but if it is used via basilisk it does not? I am having problems with question 2. basilisk will install its own miniconda to isolate all the python dependencies needed. If you are interfering with that I don't know what to suggest. I will try to have a look at your dev branch.

ClemensKohl commented 2 years ago

Thanks, we will ask for help at bioc-devel too! If we cannot implement basilisk in time for the release, can we proceed with the version still using reticulate only? @jwokaty : I believe there is only a pytorch installation missing for the Linux server currently.

About question 1: Yes, for very small datasets (for example the matrix in the testdata in ./tests/testthat/testdata/countries.rda) run_cacomp(dataset, python= T) runs without problems. For larger datasets (but even still comparatively small as for example the darmanis dataset in the vignette) it never finishes and keeps using resources. I am unsure why that is the case, with reticulate alone we did not observe it before. Question 2: The problem is that conda as installed by basilisk seems to be not using the libraries that are installed in the environment (at /scratch/local/$USER/.cache/R/basilisk/1.7.0/APL/0.99.4/env1/lib/ for this package on Linux), but is instead trying to use shared libraries, which lead in our case to problems as a too low gcc version was installed on the system. We can set export LD_LIBRARY_PATH=/scratch/local/$USER/.cache/R/basilisk/1.7.0/APL/0.99.4/env1/lib/ but this will mean all programs run in the same session will preferentially use the conda libraries.

VingronLab commented 2 years ago

Dear @vjcitn, is there something we can do about the error we got? Our assumption is that it is only due to the missing pytorch installation for the Linux server. Since the deadline is already in 3 days we wonder what should we do about this error. Thanks!

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/APL to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

jwokaty commented 2 years ago

@ClemensKohl Sorry I couldn't get to your message last week. I had to make a change on the linux builder and your package now passes on it.

ClemensKohl commented 2 years ago

No problem at all, thanks a lot for adding torch!

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 7c1319dabae823ef3b5711a64214b85fad6c28df

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/APL to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

lshep commented 2 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/VingronLab.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("APL"). The package 'landing page' will be created at

https://bioconductor.org/packages/APL

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.