davidavdav / NamedArrays.jl

Julia type that implements a drop-in replacement of Array with named dimensions
Other
120 stars 20 forks source link

Implementation of an equivalent to the R function `base::merge` #134

Open ZekeMarshall opened 9 months ago

ZekeMarshall commented 9 months ago

Hi,

I was wondering whether there were any plans to implement a function similar to the R function base::merge?

I would be happy to help develop such a function and have started to do so, however I'd just like to check whether you would be amenable to the idea?

Below is a barebones function I created for a project I'm working on to align the columns for two NamedArrays.NamedArray objects, which could form the start of a merge-like function with the addition of hcat and vcat steps. There is probably a much better way to do this!

using NamedArrays
function align_array_columns(x::NamedArray, y::NamedArray, colorder::String = "x")

    # Check which columns are missing from x and y
    x_missing_cols = setdiff(Set(names(y)[2]), Set(names(x)[2]))
    y_missing_cols = setdiff(Set(names(x)[2]), Set(names(y)[2]))

    # If there are missing columns in the x matrix
    x_mat = copy(x)
    if length(x_missing_cols) != 0
        x_mat_missing = NamedArray(zeros(size(x,1), length(x_missing_cols)), names = (vec(names(x)[1]), collect(x_missing_cols)))
        x_mat_colnames = names(x)[2]
        x_mat = [x x_mat_missing]
        setnames!(x_mat, [x_mat_colnames; collect(x_missing_cols)], 2)
    end

   # If there are missing columns in the y matrix
    y_mat = copy(y)
    if length(y_missing_cols) != 0
        y_mat_missing = NamedArray(zeros(size(y,1), length(y_missing_cols)), names = (vec(names(y)[1]), collect(y_missing_cols)))
        y_mat_colnames = names(y)[2]
        y_mat = [y y_mat_missing]
        setnames!(y_mat, [y_mat_colnames; collect(y_missing_cols)], 2)
    end

    if colorder == "x"
        y_mat = y_mat[:, names(x_mat)[2]]
    elseif colorder == "y"
        x_mat = x_mat[:, names(y_mat)[2]]
    end

    aligned_mats = (x = x_mat, y = y_mat)

    return aligned_mats

end

Apologies if I have missed something!

Cheers,

Zeke

davidavdav commented 9 months ago

Hi, I am not really familiar with what R merge() does. Does it stack data where in one of the dimensions the labels are the same?

In general I think we'd want an interface to such a merge function that can operate on any dimension.

ZekeMarshall commented 9 months ago

Hi @davidavdav , thanks for your quick reply!

I agree, such a function would need to be able to operate on any or all dimensions.

Here is an example of the R function base::merge():

txt1 <- "column1   column2   column3   column4
        row1   0         1         0         0
        row2   0         0         1         0
        row3   1         0         0         1
        "

txt2 <- "column5   column6   column7   column8
        row4   0         1         0         0
        row5   0         0         1         0
        row6   1         0         0         1
        "
dat1 <- read.table(textConnection(txt1), header = TRUE)  |> as.matrix()
dat2 <- read.table(textConnection(txt2), header = TRUE)  |> as.matrix()

merge(x = dat1, y = dat2, by = "row.names", all = TRUE)

Which returns a data frame which can then be converted back into a matrix.

  Row.names column1 column2 column3 column4 column5 column6 column7 column8
1      row1       0       1       0       0      NA      NA      NA      NA
2      row2       0       0       1       0      NA      NA      NA      NA
3      row3       1       0       0       1      NA      NA      NA      NA
4      row4      NA      NA      NA      NA       0       1       0       0
5      row5      NA      NA      NA      NA       0       0       1       0
6      row6      NA      NA      NA      NA       1       0       0       1

What I would find useful would be a NamedArray-only merge like function. And a set of functions which align two matrices selected dimensions values, introducing a zero or missing, as the function above does for columns only.

Let me know your thoughts and thanks again!

Zeke