Bioconductor / SummarizedExperiment

A container (S4 class) for matrix-like assays
https://bioconductor.org/packages/SummarizedExperiment
33 stars 9 forks source link

Enforce unique assay names #60

Open hpages opened 2 years ago

hpages commented 2 years ago

This started as a more general discussion about empty strings in List names but the real concern seems to be more specifically about the names of the assays. It comes down to these basic questions:

  1. Should we enforce names on the assays? Right now assay names are optional:

    library(SummarizedExperiment)
    
    m1 <- matrix(1:12, ncol=3)
    m2 <- m1 + 100.5
    se <- SummarizedExperiment(list(m1, m2))
    
    assayNames(se)
    # NULL
    
    ## Note that the show() method is misleading here, suggesting that the names are empty strings:
    se
    # class: SummarizedExperiment 
    # dim: 4 3 
    # metadata(0):
    # assays(2): '' ''
    # rownames: NULL
    # rowData names(0):
    # colnames: NULL
    # colData names(0):
  2. If the user does not supply assay names, should we make automatic names? (the other option would be to complain in an error message)

  3. Should we enforce their uniqueness? Right now they can have duplicates:

    se <- SummarizedExperiment(list(A=m1, A=m2))
    assayNames(se)
    # [1] "A" "A"
  4. Should we also forbid empty or NA names? Right now they are allowed:

    se <- SummarizedExperiment(setNames(list(m1, m2), c("", NA)))
    assayNames(se)
    # [1] "" NA

My answer would be "yes" to all 4 questions.

Note that the situation is very similar to what data.frame() and DataFrame() do with column names (when check.names=TRUE). So the last question is:

  1. Should we just use make.names(., unique=TRUE) like data.frame() and DataFrame() do to fix the user-supplied names?

@LTLA @vjcitn @lawremi Comments? Suggesttions?

vjcitn commented 2 years ago

I'd be affirmative for all 5, but I wonder what "enforce names" in proposal 1 entails? validObject will fail if names are absent, duplicated or NA? Constructor will supply "X", "X.1" and so on when no names present by proposals 2, 5.

hpages commented 2 years ago

I was thinking in mimicking the approach taken by data.frame()/DataFrame(), which is to make it hard, but not impossible, to construct an object with no names, or with names that contain "", NA, or duplicates:

df <- data.frame(a=11:14, b=LETTERS[1:4], c=31:34)

names(df) <- NULL
names(df)
# NULL

names(df) <- c("", NA, "")
names(df)
# [1] "" NA ""

As you can see, you can completely get rid of the names, or set names with "", NA, or duplicates, if you really want to. But I was not necessarily thinking in encoding this in the validity method for SummarizedExperiment objects, at least not for now, because I don't know how many serialized SummarizedExperiment derivatives this would break. This is something that can always be done later.