Submit CytoMDS package #3281

phauchamps closed 3 months ago

phauchamps commented 5 months ago

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

bioc-issue-bot commented 5 months ago

Hi @phauchamps

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: CytoMDS
Title: Low Dimensions projection of cytometry samples
Version: 0.99.8
    c(person(given = "Philippe",
   family = "Hauchamps",
   role = c("aut", "cre"),
   email = "",
   comment = c(ORCID = "0000-0003-2865-1852")),
      person(given = "Laurent", family = "Gatto",
   email = "",
   role = "aut",
   comment = c(ORCID = "0000-0002-1520-2268")),
      person(given = "Dan",
   family = "Lin",
   role = "ctb",
   email = ""))
Description: This package implements a low dimensional visualization of a set
 of cytometry samples, in order to visually assess the 'distances' between them.
 This, in turn, can greatly help the user to identify quality issues 
 like batch effects or outlier samples, and/or check the presence of potential 
 sample clusters that might align with the exeprimental design.  
 The CytoMDS algorithm combines, on the one hand, the concept of Earth Mover's 
 Distance (EMD), a.k.a. Wasserstein metric and, on the other hand, 
 the Multi Dimensional Scaling (MDS) algorithm for the low dimensional
 Also, the package provides some diagnostic tools for both checking the quality 
 of the MDS projection, as well as tools to help with the interpretation of 
 the axes of the projection.
License: GPL-3
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
biocViews: FlowCytometry, QualityControl, DimensionReduction, 
 MultidimensionalScaling, Software, Visualization
    R (>= 4.3)
    testthat (>= 3.0.0),
VignetteBuilder: knitr
Config/testthat/edition: 3
phauchamps commented 4 months ago

Hi @DarioS,

Thank you for accepting to review my package :-) Since I created the current issue, I added 2 minor versions of the package to my repository (CytoMDS 0..99.8 -> 0.99.10). Therefore I can now commit the changes also to the Bioconductor devel repo. However, would you like that I keep it as the 0.99.8 version for the time being, while you are reviewing it? Please let me know :-)

Thank you again,


phauchamps commented 4 months ago

Hi @DarioS,

Thank you for accepting to review my package :-) Since I created the current issue, I added 2 minor versions of the package to my repository (CytoMDS 0..99.8 -> 0.99.10). Therefore I can now commit the changes also to the Bioconductor devel repo. However, would you like that I keep it as the 0.99.8 version for the time being, while you are reviewing it? Please let me know :-)

Thank you again,


@DarioS : I finally pushed the new version. Incidentally, this also removed the 'warnings' tag (which was due to MacOS build being to long to generate).

DarioS commented 4 months ago

Overall, a good submission. A few issues are noted.

        if (useBiocParallel){
            distribBlockList <- BiocParallel::bplapply(
                FUN = loadFFAndCalcHistograms,
                BPPARAM = BPPARAM,
                BPOPTIONS = BPOPTIONS,
                loadFlowFrameFUN = loadFlowFrameFUN,
                loadFlowFrameFUNArgs = loadFlowFrameFUNArgs,
                channels = channels,
                breaks = breaks,
                verbose = verbose)
        } else {
            distribBlockList <- lapply(
                FUN = loadFFAndCalcHistograms,
                loadFlowFrameFUN = loadFlowFrameFUN,
                loadFlowFrameFUNArgs = loadFlowFrameFUNArgs,
                channels = channels,
                breaks = breaks, 
                verbose = verbose)

For single core mode, please just use BiocParallel's SerialParam() and get rid of all instances of code duplication.

        distrs <- list()
        ind <- 0
        for (b in seq_along(blocks1D)) {
            for(i in seq_along(blocks1D[[b]])) {
                ind <- ind+1    
                distrs[[ind]] <- distribBlockList[[b]][[i]]

You should use unlist with recursive = FALSE instead. I show an example.

example <- list(list(LETTERS[1:5], LETTERS[6:10]), list(LETTERS[11:15], LETTERS[16:20]))
unlist(example, recursive = FALSE)
ffList <- list()

for (i in seq_len(nSample)) {
    ffList[[i]] <- CytoPipeline::subsample(
                nEvents = 1000,
                seed = i)

We now create two simulated data sets, of 20 samples each, by combining events from the two samples of the OMIP021 original data set.

I don't understand the purpose of this. Why not analyse experimental data?

> mdsObj1
[1] 8.5817848 0.8525102

[1] 0.9096371 0.0903629

         1       2       3       4       5       6       7       8       9      10      11      12
2  1.76925 
3  0.55560 1.89565
4  1.62995 0.50810 1.78135
5  0.66230 2.17985 0.67050 1.95135
6  1.67935 0.58270 1.92735 0.60060 2.06245
7  0.50030 1.63315 0.67890 1.55475 0.97330 1.60175
8  1.76675 0.44360 1.83945 0.54720 2.20845 0.73850 1.58885
9  0.41880 1.96395 0.57090 1.75765 0.45950 1.87115 0.69840 1.95965
10 1.85695 0.43910 2.02975 0.59750 2.26375 0.63130 1.72295 0.49720 2.04465
11 0.59835 1.96950 0.53525 1.84570 0.58575 1.81470 0.77395 2.06530 0.54995 2.05880
12 1.84495 0.41810 1.97165 0.47160 2.19265 0.67420 1.73335 0.47140 1.98555 0.45600 2.04800
13 0.37195 1.78410 0.57005 1.66470 0.63315 1.71360 0.53125 1.81840 0.45605 1.88450 0.56030 1.87010
14 1.90415 0.42710 2.00675 0.62920 2.33165 0.77530 1.73185 0.45170 2.10725 0.37020 2.14030 0.48950 1.92140
15 0.53320 2.02115 0.63020 1.91555 0.51650 1.91265 0.72660 2.07105 0.47270 2.11985 0.45555 2.10775 0.50905 2.20185
                                                   ...                      ...
phauchamps commented 4 months ago

Hi Dario,

Thank you for the insightful comments, and the time spent on reviewing the package code in details. I agree with all your code related remarks, and will improve the package code accordingly.

Regarding the lack of biological insight in the vignette and the why of the simulated data, I hope to be able to address both concerns in a common way, i.e. by replacing the simulated data by a more representative public dataset for illustration.

I expect to work on the corrections in the coming days, and be back to you soon.

Best regards,


phauchamps commented 3 months ago

Hi Dario,

I have just pushed a new version (CytoMDS 0.99.13) which addresses the issues you mentioned. Here below is my 'point by point' answer:



DarioS commented 3 months ago

CytoMDS has been substantially improved. There is just one outstanding issue, though.

distances <- rep(0., length(channels))
   ...        ...
for (ch in seq_along(channels)) {
        ...                ...        
  distances[ch] <- wasserstein1d(a = locations, wa = wA, b = locations, wb = wB)


Some loops might not be able to be converted into apply, but the majority of the ones I manually inspected can and must be.

phauchamps commented 3 months ago

Hi @DarioS,

I have just pushed version 0.99.14 which addresses most of these for(...) loops. In this latest version, for(...) loops are still used at 3 places in the code, where nested loops implement complex relationships between objects of different dimensions (with indices translations) so I think that converting these nested loops into apply() family of functions would be at best awkward, if not impossible, and in any case, at the expense of deteriorating the readability of the code.

Let me know what you think :-)



DarioS commented 3 months ago


