TileDB-Inc / TileDB-R

R interface to TileDB: The Modern Database
https://tiledb-inc.github.io/TileDB-R
Other
103 stars 18 forks source link

Returning list structures for the same array and group metadata are not identical #775

Open cgiachalis opened 1 week ago

cgiachalis commented 1 week ago

Issue

Putting the same metadata on an array and group and then retrieving them back to R, the returning objects are equivalent but not identical.

For the case of retrieving all metadata :

Is it intentional? I found no documentation or usage why the group metadata require an extra attribute on each element.

Here's a reproducible example:

R Code - reprex ```r library(tiledb) # version 0.30.2 # metadata for array and group md <- list("a1" = 1, "b2" = 2) nms <- names(md) # Array metadata ------------------------ uri_arr <- tempfile("arr1") fromDataFrame(data.frame(a = "foo"), uri_arr) arr_handle <- tiledb_array(uri_arr) arr_handle <- tiledb_array_open(arr_handle, type = "WRITE") # Put metadata status <- mapply( key = nms, val = md, FUN = function(key, val) {tiledb_put_metadata(arr_handle, key, val)}) all(status) # check all OK #> [1] TRUE arr_handle <- tiledb_array_close(arr_handle) arr_handle <- tiledb_array_open(arr_handle, type = "READ") arr_metadata <- tiledb_get_all_metadata(arr_handle) # Group metadata ------------------------ uri_grp <- tempfile("grp1") grp <- tiledb_group_create(uri_grp) grp <- tiledb_group(grp, type = "WRITE") # Put metadata status <- mapply( key = nms, val = md, FUN = function(key, val) {tiledb_group_put_metadata(grp, key, val)}) all(status) # check all OK #> [1] TRUE grp <- tiledb_group_close(grp) grp <- tiledb_group_open(grp, type = "READ") grp_metadata <- tiledb_group_get_all_metadata(grp) ```

Results

# What ??? :(
all.equal(arr_metadata, grp_metadata)
 [1] "Attributes: < names for target but not for current >"             
 [2] "Attributes: < Length mismatch: comparison on first 0 components >"
 [3] "Component \"a1\": Attributes: < target is NULL, current is list >"
 [4] "Component \"b2\": Attributes: < target is NULL, current is list >"

# OK
all.equal(arr_metadata, grp_metadata, check.attributes = FALSE)
[1] TRUE

# Object structure
str(arr_metadata)
 List of 2
  $ a1: num 1
  $ b2: num 2
  - attr(*, "class")= chr "tiledb_metadata"

str(grp_metadata)
 List of 2
  $ a1: num 1
   ..- attr(*, "key")= chr "a1"
  $ b2: num 2
   ..- attr(*, "key")= chr "b2"

# Print to console
arr_metadata
a1: 1
b2: 2

grp_metadata
$a1
[1] 1
attr(,"key")
[1] "a1"

$b2
[1] 2
attr(,"key")
[1] "b2"

Comments/Notes/Fin

In practice, I do strip off the "key" attribute to get identical output structure which also helps in unit testing or mixing array and group metadata for whatever reason.

Other notes and observations:

I hope the above were helpful towards a consistent metadata interface (structure, class, print method, functionality) :) .

Thanks

cgiachalis commented 1 week ago

As a last note, it seems at C++ level the group getter is assigned 'key' attribute whereas 'names' for array although the code logic is identical.

libtiledb_array_get_metadata_from_index https://github.com/TileDB-Inc/TileDB-R/blob/c2ba622f7ca0e5bb448f127e0e113bcac277a486/src/libtiledb.cpp#L2878-L2879

libtiledb_group_get_metadata_from_index https://github.com/TileDB-Inc/TileDB-R/blob/c2ba622f7ca0e5bb448f127e0e113bcac277a486/src/libtiledb.cpp#L5434-L5435