Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

scAlign #1004

Closed UCDNJJ closed 5 years ago

UCDNJJ commented 5 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 5 years ago

Hi @UCDNJJ

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: scAlign
Version: 0.99.0
Date: 2019-02-15
Title: An alignment and integration method for single cell genomics
Description: An unsupervised deep learning method for data alignment, integration and estimation of per-cell differences in -omic data (e.g. gene expression) across datasets (conditions, tissues, species). See Johansen and Quon (2019) <doi:10.1101/504944> for more details.
Authors@R: 
   c(person(given = "Nelson",
  family = "Johansen",
  role = c("aut", "cre"),
  email = "njjohansen@ucdavis.edu"),
     person(given = "Gerald",
  family = "Quon",
  role = c("aut"),
  email = "gquon@ucdavis.edu"))
URL: https://github.com/ucdavis/quontitativebiology/tree/master/scAlign
BugReports: https://github.com/ucdavis/quontitativebiology/issues
biocViews: SingleCell, Alignment, Transcriptomics, DimensionReduction, NeuralNetwork
Depends: R (>= 3.5.0)
Imports: Seurat,
     tensorflow,
     tfdatasets,
     purrr,
         irlba,
     Rtsne,
     ggplot2,
     methods
Suggests: knitr,
rmarkdown,
testthat
VignetteBuilder: knitr
SystemRequirements:python (< 3.7), tensorflow
RoxygenNote: 6.1.1
License: GPL-3 | file LICENSE
LazyData: false
Encoding: UTF-8

Add SSH keys to your GitHub account. SSH keys will are used to control access to accepted Bioconductor packages. See these instructions to add SSH keys to your GitHub account.

bioc-issue-bot commented 5 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

mtmorgan commented 5 years ago

Why not support the SingleCellExperiment class for input? This way your package interoperates with Bioconductor rather than stands alone and is underused?

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

UCDNJJ commented 5 years ago

We agree that supporting SingleCellExperiment would be beneficial to the Bioconductor community. Planning to update the package.

Also, the scAlign packages requires tensorflow to be installed on the system which is producing one of the errors during the automated check.

mtmorgan commented 5 years ago

@lshep we'll arrange for tensorflow if you'll arrange for SCE!

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

a68be92 SCE update

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

lshep commented 5 years ago

tensorflow has been installed on the linux and mac machine - we currently are not using python3 on the single package builder and therefore tensorflow is unavailable for windows platform at this time. - I will discuss with whom manages the daily builders to see if we have tensorflow available on windows via a manual install or if those have been updated to python3 to have a better idea of what we can do on the windows platform.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

ce770d6 Fix Imports

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

be68adc Remove old docs

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

7140459 Version bump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

UCDNJJ commented 5 years ago

Thanks for getting tensorflow setup!

It appears that our package passes the checks on the Linux machine.

There seems to be an issue with the tensorflow install on the mac machine which is preventing a preliminary test tensorflow script from completing.

lshep commented 5 years ago

@Kayla-Morrell I think this is okay to review based on the linux platform while I investigate the tensorflow installation issue.

Kayla-Morrell commented 5 years ago

Hello @UCDNJJ,

I am working through your initial review at the moment. I am trying to review your vignette but I am unable to access the data needed to work through the examples. The link in the vignette does not work. Can you edit the file and/or provide the link here so I can continue to work through the vignette.

Thank you, Kayla

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

f4fa849 Update DESCRIPTION

UCDNJJ commented 5 years ago

Hi @Kayla-Morrell

Updated the link, hopefully you can access the data now. Let me know if there are any other issues running the vignette!

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

1765452 Add gridExtra to vignette

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

Kayla-Morrell commented 5 years ago

I was able to download the data and I'm working through the code in your vignette. I'm running into an issue when I'm trying to create the SCE object to pass into scAlignCreateObject (scAlign setup section). When trying to evaluate youngMouseSCE I'm getting the following error:

Error in .validate_names(rownames, ans_rownames, "assay rownames()", "rowData rownames() / rowRanges names()") :
  assay rownames() must be NULL or identical to rowData rownames() /
  rowRanges names()

I assume it has something to do with the fact that counts = youngMouseSeuratObj@raw.data has 8357 rows while scale.data = youngMouseSeuratObj@scale.data[gene.use,] has 3615 rows. Please make the necessary edits and then I'll be able to work through the rest of the vignette.

Best, Kayla

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

a749c3f Fix vignette

UCDNJJ commented 5 years ago

Thanks @Kayla-Morrell, the issue was as you suggested: different number of rows between assays. Fixed that issue and added SCE library call.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

54cae3a reduce image size for 5MB req.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

Kayla-Morrell commented 5 years ago

Hi @UCDNJJ, Thank you for submitting to Bioconductor. Please see the initial review of your package below:

DESCRIPTION

Description of package is good. Good use of README file for Python and Tensorflow installation.

NAMESPACE

NEWS

utils::news(package="scAlign")

Vignette

The vignette needs a lot of work. The vignette should produce the same results when copying the corresponding commands into an R session. It is essential that the vignette embeds executed R code.

> scAlignHSC = scAlign(scAlignHSC,
+ options = scAlignOptions(steps = 5000, norm = TRUE, log.every = 3000, early.stop = TRUE),
+ encoder.data = "scale.data",
+ supervised = 'none',
+ run.encoder = TRUE,
+ run.decoder = FALSE,
+ log.dir = file.path(results.dir, 'models', 'gene_input'),
+ device = "CPU")
2019-03-14 11:27:19.421681: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
WARNING:tensorflow:From /Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/util/deprecation.py:320: Print (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2018-08-20.
Instructions for updating:
Use tf.print instead of tf.Print. Note that tf.print returns a no-output operator that directly prints the output. Outside of defuns or eager mode, this operator will not be executed unless it is directly specified in session.run or used as a control dependency for other operators. This is only a concern in graph mode. Below is an example of how to ensure tf.print executes in graph mode:
```python
    sess = tf.Session()
    with sess.as_default():
        tensor = tf.range(10)
        print_op = tf.print(tensor)
        with tf.control_dependencies([print_op]):
          out = tf.add(tensor, tensor)
        sess.run(out)

Additionally, to use tf.print in python 2.7, users must make sure to import the following:

from __future__ import print_function

TensorFlow check: [Passed] [1] "============== Step 1/3: Encoder training ===============" [1] "Graph construction" [1] "Error during alignment, returning scAlign class." <simpleError in as_tf_dataset(tf$data$Dataset$from_tensor_slices(tensors = tensors)): Provided dataset is not a TensorFlow Dataset>


- [ ] REQUIRED: Rethink the file path for `log.dir` when running scAlign since removing "results.dir" from code. Perhaps a temporary file would be a better option for this.

- [ ] CLARIFICATION: I'm not sure how this effects the code but when building the vignette the code will get run all at once. For the different runs of scAlign does it matter that you are overwriting "scAlignHSC" each time you run an example? You might consider having a different variable name for each of the examples so they are unique.

- [ ] REQUIRED: When running through the example for running scAlign with PCA I get the following error:

scAlignHSC = scAlign(scAlignHSC,

  • options = scAlignOptions(steps=15000, norm=TRUE, log.every = 3000, early.stop = FALSE),
  • encoder.data = "PCA",
  • supervised = 'none',
  • run.encoder = TRUE,
  • run.decoder = FALSE,
  • log.dir = file.path(results.dir, 'models', 'pca_input'),
  • device = "CPU") TensorFlow check: [Passed] Error in sce.object@reducedDims[[data.use]][, object1.idx] : (subscript) logical subscript too long
> set.seed(5678)
> gene_plot = PlotTSNE(scAlignHSC, "ALIGNED-GENE", title = "scAlign-Gene", perplexity = 30)
[1] "Error plotting the data provided."
<simpleError in Rtsne.default(object@reducedDims[[data.use]], ...): Input X is not a matrix>
> pca_plot = PlotTSNE(scAlignHSC, "ALIGNED-PCA", title="scAlign-PCA", perplexity=30)
[1] "Error plotting the data provided."
<simpleError in Rtsne.default(object@reducedDims[[data.use]], ...): Input X is not a matrix>
> cca_plot = PlotTSNE(scAlignHSC, "ALIGNED-CCA", title="scAlign-CCA", perplexity=30)
[1] "Error plotting the data provided."
<simpleError in Rtsne.default(object@reducedDims[[data.use]], ...): Input X is not a matrix>
> legend = get_legend(PlotTSNE(scAlignHSC, "ALIGNED-GENE", title="scAlign-Gene", legend="right", max_iter=1))
[1] "Error plotting the data provided."
<simpleError in Rtsne.default(object@reducedDims[[data.use]], ...): Input X is not a matrix>
Error in plot_to_gtable(plot) :
  Argument needs to be of class "ggplot", "gtable", "grob", "recordedplot", or a function that plots to an R graphicsdevice when called, but is a NULL

man pages

PlotTSNE

> scAlignHSC = scAlign(scAlignHSC,
+                     options=scAlignOptions(steps=1, log.every=1, early.stop=FALSE, architecture="large"),
+                     encoder.data="scale.data",
+                     supervised='none',
+                     run.encoder=TRUE,
+                     run.decoder=TRUE,
+                     log.results=FALSE,
+                     device="CPU")
TensorFlow check: [Passed]
[1] "Training steps should be at least 1000."
[1] "============== Step 1/3: Encoder training ==============="
[1] "Graph construction"
[1] "Error during alignment, returning scAlign class."
<simpleError in as_tf_dataset(tf$data$Dataset$from_tensor_slices(tensors = tensors)): Provided dataset is not a TensorFlow Dataset>
[1] "============== Step 2/3: YOUNG decoder training ==============="
[1] "Graph construction"
[1] "Error during interpolation, returning scAlign class."
<Rcpp::exception in py_call_impl(callable, dots$args, dots$keywords): TypeError: Error converting shape to a TensorShape: int() argument must be a string, a bytes-like object or a number, not 'list'.

Detailed traceback:
  File "/Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/array_ops.py", line 2077, in placeholder
    return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
  File "/Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5789, in placeholder
    shape = _execute.make_shape(shape, "shape")
  File "/Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/eager/execute.py", line 147, in make_shape
    raise TypeError("Error converting %s to a TensorShape: %s." % (arg_name, e))

decoder_large

decoder_medium

encoder_large

encoder_medium

scAlign

> scAlignHSC = scAlign(scAlignHSC,
+                     options=scAlignOptions(steps=1, log.every=1, early.stop=TRUE, architecture="large"),
+                     encoder.data="scale.data",
+                     supervised='none',
+                     run.encoder=TRUE,
+                     run.decoder=TRUE,
+                     log.results=FALSE,
+                     device="CPU")
TensorFlow check: [Passed]
[1] "Training steps should be at least 1000."
[1] "============== Step 1/3: Encoder training ==============="
[1] "Graph construction"
[1] "Error during alignment, returning scAlign class."
<simpleError in as_tf_dataset(tf$data$Dataset$from_tensor_slices(tensors = tensors)): Provided dataset is not a TensorFlow Dataset>
[1] "============== Step 2/3: YOUNG decoder training ==============="
[1] "Graph construction"
[1] "Error during interpolation, returning scAlign class."
<Rcpp::exception in py_call_impl(callable, dots$args, dots$keywords): TypeError: Error converting shape to a TensorShape: int() argument must be a string, a bytes-like object or a number, not 'list'.

Detailed traceback:
  File "/Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/array_ops.py", line 2077, in placeholder
    return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
  File "/Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5789, in placeholder
    shape = _execute.make_shape(shape, "shape")
  File "/Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/eager/execute.py", line 147, in make_shape
    raise TypeError("Error converting %s to a TensorShape: %s." % (arg_name, e))

scAlignArguments

scAlignCreateObject

Unit tests

http://bioconductor.org/developers/how-to/unitTesting-guidelines/

R code

Check out formatR (https://cran.r-project.org/web/packages/formatR/index.html), a package that assists with formatting.

scAlign

scAlignClass

utils

Comment back here with updates that have been made and when the package is ready for a re-review.

Best, Kayla

Kayla-Morrell commented 5 years ago

A follow up comment for my initial review.

```{r} ... R code ... ```.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

fad0e95 Update DESCRIPTION

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

8b0bde7 Update DESCRIPTION

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

5e7f1c9 Update DESCRIPTION

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

bcdaa88 Update DESCRIPTION

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

ee310fb Update DESCRIPTION

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

4416a7d Fix test to include cellbench data

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

d5d8dcd Update DESCRIPTION

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

e50bdd6 Update DESCRIPTION

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

163e83e Update DESCRIPTION

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

UCDNJJ commented 5 years ago

@Kayla-Morrell

Thanks for taking the time to review or scAlign package! Fixed the issues with errors when running scAlign, code base and github were out of sync.

We had to change the example dataset to keep the package under 5MB, thus the vignette has changed substantially.

Now that the examples, unit tests and vignette are running correctly we unfortunately are running into the 10 min maximum time limit or 5 min R CMD check limit even after attempts to make each function and call to scAlign as simple and fast as possible. How should we address this?

I filled in your previous review based on the issues that have now been address to the best of my knowledge. Also in regards to a few of the points:

  • [ ] CLARIFICATION: I'm not sure how this effects the code but when building the vignette the code will get run all at once. For the different runs of scAlign does it matter that you are overwriting "scAlignHSC" each time you run an example? You might consider having a different variable name for each of the examples so they are unique.

scAlign() takes in an SCE object and either returns the same object if an error occurs or appends a new reducedDim slot to the SCE object. So no issue with overwriting.

scAlignArguments

  • [ ] REQUIRED: The arguments need to be defined.

This is an internal function whose arguments are already defined by the documentation for scAlign() (scAlignArguments() shares the same arguments with scAlign() and is just a helper function).

Hi @UCDNJJ, Thank you for submitting to Bioconductor. Please see the initial review of your package below:

DESCRIPTION

Description of package is good. Good use of README file for Python and Tensorflow installation.

  • [x] SUGGESTION: Under "License:" remove the "file LICENSE" text as well as the LICENSE file since they are not needed.
  • [ ] SUGGESTION: The BiocCheck had some suggested biocViews (Alignment, Network, GraphAndNetwork), consider adding these to your list if applicable.
  • [x] REQUIRED: The URL for "BugReports:" is not correct, please edit this.
  • [x] REQUIRED: The URL for "URL:" is also not correct, please edit this.

NAMESPACE

  • [ ] SUGGESTION: Generally importFrom() is encouraged over importing an entire package. However, if there are many functions from a single packages, import() is okay. Just keep this in mind.

NEWS

  • [x] REQUIRED: There should be a space after the + so that your NEWS file formats properly. You can check the NEWS file by installing your package and then running the command below in R.
utils::news(package="scAlign")

Vignette

The vignette needs a lot of work. The vignette should produce the same results when copying the corresponding commands into an R session. It is essential that the vignette embeds executed R code.

  • [x] SUGGESTION: Change the name of the vignette file to "scAlign". It is more intuative to the user when they are using vignette() to look for the name of the packages instead of vignette("alignment") like they would have to do now.
  • [x] REQUIRED: Add a package: tag in the vignette metadata. Also change the \usepackage[utf8]{inputenc} to %\VignetteEncoding{UTF-8}, since this is a non-Sweave vignette.
  • [x] REQUIRED: Add a section header "Introduction" before the first paragraph of text since this is basically an introduction to the package.
  • [x] REQUIRED: Add an "Installation" section that shows users how to download and load the package from Bioconductor.
  • [ ] SUGGESTION: We highly suggest the use of table of contents.
  • [x] REQUIRED: An overall formatting requirement, when talking about an R package or function they should stand out from regular text. This is achieved by adding ` around the word, for examplescAlignwill result inscAlign`. This helps the user to understand when you are talking about a function or package.
  • [x] REQUIRED: Including url's to github repos for data can be very inconsistent (as we saw from my initial comment about the link not being correct). Since the data is used solely for the vignette, we preper that the data be provided to the user via the package. This can be done by including the .rda file in a data folder for your package, I would rename it to "C57BL6_mouse_data.Rda". Then you will have to provide documentation (most likely in the vignette) about how it was created and source information. Please remember to compress the data. Once the data is included in your package then all the user will have to do is use data(C57BL6_mouse_data) to access the data needed for the vignette.
  • [x] REQUIRED: Once data is added to a data package then delete the working.dir and results.dir code since it won't be needed. Edit the line about loading the data to data("C57BL6_mouse_data", package = "scAlign", envir = environment()). This will also help the builder to check that your vignette code checks. It has not been able to check your vignette code because it cannot download the data like you instruct the user to do.
  • [x] REQUIRED: Avoid the use of direct slot access with '@', an accessor method should be created and utilized instead. '@' is used in Data setup and scAlign setup.
  • [x] REQUIRED: When working through the code of the vignette, the code under the 'scAlign setup' section is cut off. Please fix this so that the user can replicate the code you use.
  • [x] REQUIRED: Under "Alignment of young and old HSCs", the first sentence has a spelling error for 'populations'.
  • [x] REQUIRED: When running scAlign with high_var_genes I get an error. Below is the error.
> scAlignHSC = scAlign(scAlignHSC,
+ options = scAlignOptions(steps = 5000, norm = TRUE, log.every = 3000, early.stop = TRUE),
+ encoder.data = "scale.data",
+ supervised = 'none',
+ run.encoder = TRUE,
+ run.decoder = FALSE,
+ log.dir = file.path(results.dir, 'models', 'gene_input'),
+ device = "CPU")
2019-03-14 11:27:19.421681: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
WARNING:tensorflow:From /Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/util/deprecation.py:320: Print (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2018-08-20.
Instructions for updating:
Use tf.print instead of tf.Print. Note that tf.print returns a no-output operator that directly prints the output. Outside of defuns or eager mode, this operator will not be executed unless it is directly specified in session.run or used as a control dependency for other operators. This is only a concern in graph mode. Below is an example of how to ensure tf.print executes in graph mode:
```python
    sess = tf.Session()
    with sess.as_default():
        tensor = tf.range(10)
        print_op = tf.print(tensor)
        with tf.control_dependencies([print_op]):
          out = tf.add(tensor, tensor)
        sess.run(out)

Additionally, to use tf.print in python 2.7, users must make sure to import the following:

from __future__ import print_function

TensorFlow check: [Passed] [1] "============== Step 1/3: Encoder training ===============" [1] "Graph construction" [1] "Error during alignment, returning scAlign class." <simpleError in as_tf_dataset(tf$data$Dataset$from_tensor_slices(tensors = tensors)): Provided dataset is not a TensorFlow Dataset>

* [x]  REQUIRED: Rethink the file path for `log.dir` when running scAlign since removing "results.dir" from code. Perhaps a temporary file would be a better option for this.
* [ ]  CLARIFICATION: I'm not sure how this effects the code but when building the vignette the code will get run all at once. For the different runs of scAlign does it matter that you are overwriting "scAlignHSC" each time you run an example? You might consider having a different variable name for each of the examples so they are unique.
* [x]  REQUIRED: When running through the example for running scAlign with PCA I get the following error:

scAlignHSC = scAlign(scAlignHSC,

  • options = scAlignOptions(steps=15000, norm=TRUE, log.every = 3000, early.stop = FALSE),
  • encoder.data = "PCA",
  • supervised = 'none',
  • run.encoder = TRUE,
  • run.decoder = FALSE,
  • log.dir = file.path(results.dir, 'models', 'pca_input'),
  • device = "CPU") TensorFlow check: [Passed] Error in sce.object@reducedDims[[data.use]][, object1.idx] : (subscript) logical subscript too long
  • [x] REQUIRED: I get the same error as the requirement above when trying to run the example of scAlign with CCA.
  • [ ] CLARIFICATION: I'm not sure how this effects the code but when building the vignette the code will get run all at once. For the different runs of scAlign does it matter that you are overwriting "scAlignHSC" each time you run an example? You might consider having a different variable name for each of the examples so they are unique. This will also effect what you are plotting because the final "scAlignHSC" will be used for plotting with would be the scAlign with CCA.
  • [x] REQUIRED: I am not able to produce any of the plots...this is the errors I get when trying to plot the first group...
> set.seed(5678)
> gene_plot = PlotTSNE(scAlignHSC, "ALIGNED-GENE", title = "scAlign-Gene", perplexity = 30)
[1] "Error plotting the data provided."
<simpleError in Rtsne.default(object@reducedDims[[data.use]], ...): Input X is not a matrix>
> pca_plot = PlotTSNE(scAlignHSC, "ALIGNED-PCA", title="scAlign-PCA", perplexity=30)
[1] "Error plotting the data provided."
<simpleError in Rtsne.default(object@reducedDims[[data.use]], ...): Input X is not a matrix>
> cca_plot = PlotTSNE(scAlignHSC, "ALIGNED-CCA", title="scAlign-CCA", perplexity=30)
[1] "Error plotting the data provided."
<simpleError in Rtsne.default(object@reducedDims[[data.use]], ...): Input X is not a matrix>
> legend = get_legend(PlotTSNE(scAlignHSC, "ALIGNED-GENE", title="scAlign-Gene", legend="right", max_iter=1))
[1] "Error plotting the data provided."
<simpleError in Rtsne.default(object@reducedDims[[data.use]], ...): Input X is not a matrix>
Error in plot_to_gtable(plot) :
  Argument needs to be of class "ggplot", "gtable", "grob", "recordedplot", or a function that plots to an R graphicsdevice when called, but is a NULL
  • [x] REQUIRED: You don't have to save the plots. This vignette should just demonstrate how the plots are created. I would think of making a function of some kind so that the user does not get 3 different plot windows and then one with grid of plots.
  • [x] REQUIRED: Since are you creating the plots with the code, you won't have to attach the figures.
  • [x] REQUIRED: There must be a last section "Session Info" that includes SessionInfo().

man pages

  • [ ] SUGGESTION: Internal functions do not have to be documented since they will not be exposed to the user. You can keep them documented if you want, but it is not needed.
  • [ ] SUGGESTION: Include a package man page with an overview of the package and links to the main functions.

PlotTSNE

  • [x] SUGGESTION: Add a bit more detail to the Description so it's not just the same thing as the tile.
  • [x] REQUIRED: When running your examples I get an error when running scAlign() on scAlignHSC. This results in an error when I go and try to run PlotTSNE(). Please see the original error below and correct accordingly.
> scAlignHSC = scAlign(scAlignHSC,
+                     options=scAlignOptions(steps=1, log.every=1, early.stop=FALSE, architecture="large"),
+                     encoder.data="scale.data",
+                     supervised='none',
+                     run.encoder=TRUE,
+                     run.decoder=TRUE,
+                     log.results=FALSE,
+                     device="CPU")
TensorFlow check: [Passed]
[1] "Training steps should be at least 1000."
[1] "============== Step 1/3: Encoder training ==============="
[1] "Graph construction"
[1] "Error during alignment, returning scAlign class."
<simpleError in as_tf_dataset(tf$data$Dataset$from_tensor_slices(tensors = tensors)): Provided dataset is not a TensorFlow Dataset>
[1] "============== Step 2/3: YOUNG decoder training ==============="
[1] "Graph construction"
[1] "Error during interpolation, returning scAlign class."
<Rcpp::exception in py_call_impl(callable, dots$args, dots$keywords): TypeError: Error converting shape to a TensorShape: int() argument must be a string, a bytes-like object or a number, not 'list'.

Detailed traceback:
  File "/Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/array_ops.py", line 2077, in placeholder
    return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
  File "/Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5789, in placeholder
    shape = _execute.make_shape(shape, "shape")
  File "/Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/eager/execute.py", line 147, in make_shape
    raise TypeError("Error converting %s to a TensorShape: %s." % (arg_name, e))
  • [x] REQUIRED: Use accessors, do not access S4 class slots via '@' (in examples).

_decoderlarge

  • [x] REQUIRED: The arguments need to be defined.

_decodermedium

  • [x] REQUIRED: The arguments need to be defined.

_encoderlarge

  • [x] REQUIRED: The arguments need to be defined.

_encodermedium

  • [x] REQUIRED: The arguments need to be defined.

scAlign

  • [x] REQUIRED: When running the examples I am unable to execute the scAlign() code. The following error (the same as above) is produced. Please edit accordingly.
> scAlignHSC = scAlign(scAlignHSC,
+                     options=scAlignOptions(steps=1, log.every=1, early.stop=TRUE, architecture="large"),
+                     encoder.data="scale.data",
+                     supervised='none',
+                     run.encoder=TRUE,
+                     run.decoder=TRUE,
+                     log.results=FALSE,
+                     device="CPU")
TensorFlow check: [Passed]
[1] "Training steps should be at least 1000."
[1] "============== Step 1/3: Encoder training ==============="
[1] "Graph construction"
[1] "Error during alignment, returning scAlign class."
<simpleError in as_tf_dataset(tf$data$Dataset$from_tensor_slices(tensors = tensors)): Provided dataset is not a TensorFlow Dataset>
[1] "============== Step 2/3: YOUNG decoder training ==============="
[1] "Graph construction"
[1] "Error during interpolation, returning scAlign class."
<Rcpp::exception in py_call_impl(callable, dots$args, dots$keywords): TypeError: Error converting shape to a TensorShape: int() argument must be a string, a bytes-like object or a number, not 'list'.

Detailed traceback:
  File "/Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/array_ops.py", line 2077, in placeholder
    return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
  File "/Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5789, in placeholder
    shape = _execute.make_shape(shape, "shape")
  File "/Users/ka36530_ca/Library/Python/3.7/lib/python/site-packages/tensorflow/python/eager/execute.py", line 147, in make_shape
    raise TypeError("Error converting %s to a TensorShape: %s." % (arg_name, e))
  • [x] REQUIRED: Use accessors, do not access S4 class slots via '@' (in examples).

scAlignArguments

  • [ ] REQUIRED: The arguments need to be defined.

scAlignCreateObject

  • [x] REQUIRED: Use accessors, do not access S4 class slot via '@' (in examples).

Unit tests

  • [x] REQUIRED: The test you have is not acceptable. This does not test the functionality or use of scAlign. Please see the link below if you plan on utilizing unit tests (which we strongly suggest doing).

http://bioconductor.org/developers/how-to/unitTesting-guidelines/

R code

  • [ ] SUGGESTION: Shorter lines (362 lines are > 80 characters), this includes roxygen documentation and comments.
  • [ ] SUGGESTION: Multiples of 4 spaces, not tabs, for line indents (854 lines are not)

Check out formatR (https://cran.r-project.org/web/packages/formatR/index.html), a package that assists with formatting.

scAlign

  • [x] REQUIRED: Avoid use of direct slot access with '@', accessor methods should be created and utilized instead.

scAlignClass

  • [x] REQUIRED: Avoid use of direct slot access with '@', accessor methods should be created and utilized instead.

utils

  • [x] REQUIRED: Avoid use of direct slot access with '@', accessor methods should be created and utilized instead.

Comment back here with updates that have been made and when the package is ready for a re-review.

Best, Kayla