Closed iphcsteve closed 4 years ago
Hi @iphcsteve
Have you written the data with the same version of TileDB?
We definitely "mix and match" as well when testing and developing. Any chance you could create minimally complete verifiable example?
Hi @iphcsteve,
We always provide backwards compatibility. However, please make sure you are not using the core TileDB library from our dev
branch, as the format spec may change across different PR merges before an actual point release. Also please do let us know if you need information choosing versions across our APIs, we'd be very happy to help.
Yes, data was written with the same TileDB version, and I'm using TileDB-R via install.packages() and TileDB-Python via pip, so both are built on the same TileDB core.
I spent the morning trying to see if I'm making a dumb error (still possible), but threw together C++ and Python examples. The C++ code can read the C++- and Python-created arrays, but the Python code can only read the array it creates, not the C++-created one.
Also, the full error in Python is
deser.read_array("tileDB_CPP") Traceback (most recent call last): File "", line 1, in
File "/Users/steve/PycharmProjects/tiledb_deser/deser.py", line 37, in read_array with tiledb.DenseArray(aname, mode='r') as A: File "tiledb/libtiledb.pyx", line 3820, in tiledb.libtiledb.DenseArrayImpl.init File "tiledb/libtiledb.pyx", line 3213, in tiledb.libtiledb.Array.init File "tiledb/libtiledb.pyx", line 3073, in tiledb.libtiledb.preload_array File "tiledb/libtiledb.pyx", line 413, in tiledb.libtiledb._raise_ctx_err File "tiledb/libtiledb.pyx", line 398, in tiledb.libtiledb._raise_tiledb_error tiledb.libtiledb.TileDBError: [TileDB::Filter] Error: Deserialization error; unexpected metadata length
C++ Code:
#include <iostream>
#include <tiledb/tiledb>
using namespace tiledb;
template <class T>
class State {
public:
State() {};
~State() {};
std::vector<T> y, z;
inline void createArr(std::string *aname) {
Domain domain(ctx);
domain.add_dimension(Dimension::create<T>(ctx, "YR", {{1, 5}}, 1))
.add_dimension(Dimension::create<T>(ctx, "RA", {{1, 2}}, 2))
.add_dimension(Dimension::create<T>(ctx, "AC", {{1, 10}}, 10));
// dense array schema.
ArraySchema schema(ctx, TILEDB_DENSE);
schema.set_domain(domain).set_order({{TILEDB_ROW_MAJOR, TILEDB_ROW_MAJOR}});
// Attributes
auto yAttr = Attribute::create<T>(ctx,"Y");
auto zAttr = Attribute::create<T>(ctx,"Z");
schema.add_attributes(yAttr, zAttr);
try {
schema.check();
} catch (const tiledb::TileDBError& e) {
std::cout << "TileDB exception:\n" << e.what() << "\n";
}
try {
if (Object::object(ctx, *aname).type() != Object::Type::Array) {
Array::create(*aname, schema);
}
} catch (TileDBError& e) {
std::cout << "TileDB exception:\n" << e.what() << "\n";
}
}
inline void writeArr(std::string *aname) {
Array array(ctx, *aname, TILEDB_WRITE);
Query query(ctx, array);
query.set_layout(TILEDB_ROW_MAJOR)
.set_buffer("Y", y)
.set_buffer("Z", z);
// Write to the correct subarray and close the array.
try {
query.submit();
query.finalize();
} catch (TileDBError& e) {
std::cout << "----------------TileDB exception:\n" << e.what() << "\n";
}
array.close();
}
inline std::vector<T> readArr(std::vector<int>* SA, std::string *aname) {
Array array(ctx, *aname, TILEDB_READ);
Query query(ctx, array);
int bufferNeeded = ( SA->at(1) - SA->at(0)+1 ) *
( SA->at(3) - SA->at(2) + 1 ) *
( SA->at(5) - SA->at(4) + 1);
std::vector<T> dataY(bufferNeeded);
query.set_subarray(*SA)
.set_layout(TILEDB_ROW_MAJOR)
.set_buffer("Y", dataY);
// Submit the query and close the array.
query.submit();
array.close();
return dataY;
}
protected:
Context ctx;
}; //end State
int main() {
State <int>S;
std::string aNameCPP = "tiledb_CPP";
std::string aNamePy = "tileDB_PY";
try {
S.createArr(&aNameCPP);
} catch (TileDBError& e) {
std::cout << "----------------TileDB exception:\n" << e.what() << "\n";
}
// Prepare input data
for (int i = 0; i < 100; ++i) {
S.y.push_back(i);
S.z.push_back(-i);
}
// read subarray
std::vector<int> rArray = {1, 2, 1, 2, 9, 10};
// Write to the array
try {
S.writeArr(&aNameCPP);
} catch (TileDBError& e) {
std::cout << "----------------TileDB exception:\n" << e.what() << "\n";
}
// Read one slice of data back in
std::vector<int> returnDataCPP, returnDataPy;
// Should return '8 9 18 19 28 29 38 39'
try {
std::cout << " \n*** CPP ***\n";
returnDataCPP = S.readArr(&rArray, &aNameCPP);
for (auto e : returnDataCPP) std::cout << e << " ";
} catch (TileDBError& e) {
std::cout << "----------------TileDB exception:\n" << e.what() << "\n";
}
try {
std::cout << " \n\n*** Python ***\n";
returnDataPy = S.readArr(&rArray, &aNamePy);
for (auto e : returnDataPy) std::cout << e << " ";
} catch (TileDBError& e) {
std::cout << "----------------TileDB exception:\n" << e.what() << "\n";
}
std::cout << "\n";
return 0;
}
Python code:
import numpy as np
import tiledb
import shutil
import os
def printVersions():
print("TileDB version: " + str(tiledb.libtiledb.version()))
print("TileDB-Py version: " + tiledb.__version__)
def create_array(aname):
# 5 x 2 x 10
dom = tiledb.Domain(tiledb.Dim(name="YR", domain=(1, 5), tile=1, dtype=np.int32),
tiledb.Dim(name="RA", domain=(1, 2), tile=2, dtype=np.int32),
tiledb.Dim(name="AC", domain=(1, 10), tile=10, dtype=np.int32))
why = tiledb.Attr(name="Y", dtype=np.int32)
zee = tiledb.Attr(name="Z", dtype=np.int32)
schema = tiledb.ArraySchema(domain=dom, sparse=False, attrs=(why, zee))
# Create the (empty) array on disk.
tiledb.DenseArray.create(aname, schema)
def write_array(aname):
# Open the array and write to it.
with tiledb.DenseArray(aname, mode='w') as A:
A[1:101] = {"Y": np.arange(100, dtype=np.int32), "Z": np.arange(100, dtype=np.int32)}
def read_array(aname):
# Open the array and read from it.
with tiledb.DenseArray(aname, mode='r') as A:
data = A[1:3, 1:3, 9:11]
print(data["Y"])
def main():
aname = "tileDB_PY"
shutil.rmtree(aname)
create_array(aname)
write_array(aname)
read_array(aname)
if __name__ == "__main__":
main()
Hi @iphcsteve,
We always provide backwards compatibility. However, please make sure you are not using the core TileDB library from our
dev
branch, as the format spec may change across different PR merges before an actual point release. Also please do let us know if you need information choosing versions across our APIs, we'd be very happy to help.
Thanks @stavrospapadopoulos ! Indeed, I just went through the format spec docs online. To your point, I'm using TileDB release-1.7.6, and pulled the R (0.5.0) and Python (0.5.8) APIs from source.
Keep up the great work, TileDB is great.
Excellent! Is that ok to close?
Nono, I still have the deser issue.
Apologies, I misunderstood. OK, we'll certainly look into it and get back to you soon.
Hi @iphcsteve -- thanks so much for reproducible code.
If I understand correctly what is needed (an additional script may have helped, and it is late-ish for work on a Friday) the following sequence seems to work:
edd@rob:~/git/tiledb-adhoc/bugreports/tiledb-1569(master)$ mkdir tileDB_PY
edd@rob:~/git/tiledb-adhoc/bugreports/tiledb-1569(master)$ python3 gen.py
[[[ 8 9]
[18 19]]
[[28 29]
[38 39]]]
edd@rob:~/git/tiledb-adhoc/bugreports/tiledb-1569(master)$ g++ -o gen gen.cpp -ltiledb
edd@rob:~/git/tiledb-adhoc/bugreports/tiledb-1569(master)$ ./gen
*** CPP ***
8 9 18 19 28 29 38 39
*** Python ***
8 9 18 19 28 29 38 39
edd@rob:~/git/tiledb-adhoc/bugreports/tiledb-1569(master)$ ls -ltr
total 344
-rw-r--r-- 1 edd edd 3852 Mar 27 19:46 gen.cpp
-rw-r--r-- 1 edd edd 1315 Mar 27 19:47 gen.py
-rwxr-xr-x 1 edd edd 332776 Mar 27 19:47 gen
drwx------ 4 edd edd 4096 Mar 27 19:47 tileDB_PY
drwx------ 6 edd edd 4096 Mar 27 19:47 tiledb_CPP
edd@rob:~/git/tiledb-adhoc/bugreports/tiledb-1569(master)$
(I wasn't too imaginative with the filenames.) Running C++ alone did not work, neither did running Python, but following a mkdir tileDB_PY
(as shown) it all seems to pan out.
I am mostly current on all three languages, but rebuild R most often. Python may be a few days old, C++ I happen to have updated this afternoon. And the R side seems to work with these array too:
edd@rob:~/git/tiledb-adhoc/bugreports/tiledb-1569(master)$ Rscript -e 'arr <- tiledb::tiledb_dense("tileDB_PY"); str(arr[])'
List of 2
$ Y: int [1:5, 1:2, 1:10] 0 20 40 60 80 10 30 50 70 90 ...
$ Z: int [1:5, 1:2, 1:10] 0 20 40 60 80 10 30 50 70 90 ...
edd@rob:~/git/tiledb-adhoc/bugreports/tiledb-1569(master)$ Rscript -e 'arr <- tiledb::tiledb_dense("tiledb_CPP"); str(arr[])'
List of 2
$ Y: int [1:5, 1:2, 1:10] 0 20 40 60 80 10 30 50 70 90 ...
$ Z: int [1:5, 1:2, 1:10] 0 -20 -40 -60 -80 -10 -30 -50 -70 -90 ...
edd@rob:~/git/tiledb-adhoc/bugreports/tiledb-1569(master)$
edd@rob:~/git/tiledb-adhoc/bugreports/tiledb-1569(master)$ Rscript -e 'arr <- tiledb::tiledb_dense("tiledb_CPP"); arr[1,1:2,1]'
$Y
, , 1
[,1] [,2]
[1,] 0 10
$Z
, , 1
[,1] [,2]
[1,] 0 -10
edd@rob:~/git/tiledb-adhoc/bugreports/tiledb-1569(master)$ Rscript -e 'arr <- tiledb::tiledb_dense("tileDB_PY"); arr[1,1:2,1]'
$Y
, , 1
[,1] [,2]
[1,] 0 10
$Z
, , 1
[,1] [,2]
[1,] 0 10
edd@rob:~/git/tiledb-adhoc/bugreports/tiledb-1569(master)$
Dirk
Thanks, and particularly also for the positive result with R. I'm now particularly stumped since you couldn't reproduce the error with my stub code! I'll dig deeper on my side, since it's apparently not a problem with TileDB or its APIs.
Steve,
One possibility: The difference in behavior may be due to me building the Python component from source too (i.e. git pull
followed by python3 setup.py build_ext --inplace
followed by sudo python3 setup.py install
) leading to the same C++ build being used for all three languages. Whereas you may have our 'dev' branch for C++ and then R (from source) but a release for Python (using pip
). If so, you could try building Python from source, or making sure you use the matching release branches.
Dirk
Hi Dirk (sorry about mixing up your name!)
The error occurred with C++ on release 1.7.6, Python via pip from release (0.5.8) or dev, and R via source.
I've rebuilt the TileDB-Py from source, and confirm that the error no longer occurs, which is great, so thanks for the pointer!
It's unfortunate though that such recent builds across languages were out-of-sync enough to cause this error. I'm wondering if it makes sense for TileDB to post a compatibility matrix? As an old engineering director myself, I know it's a timesuck but it may be worth the investment as a way to decrease user frustration down the road.
It's unfortunate though that such recent builds across languages were out-of-sync enough to cause this error. I'm wondering if it makes sense for TileDB to post a compatibility matrix? As an old engineering director myself, I know it's a timesuck but it may be worth the investment as a way to decrease user frustration down the road.
I want to clarify that we do guarantee backwards compatibility with TileDB version. I.e. 1.7 can read 1.6 arrays. We do not guarantee forwards compatibility (which is the problem you ran into). Also we do not guarantee the stability of dev
(or master
) on any of our repos. We use releases to mark stable version of all our repos.
The latest release across different repositories will target the latest official TileDB release and should all be compatible. The latest python release 0.5.8 target 1.7.6, and the latest R release (0.5.0) also targets 1.7.6. Also the TileDB-Java, TileDB-Go and other integrations all target 1.7.6 currently. We do are best to keep all APIs up to date on the latest core libtiledb release. There might be some variations in the feature set of the api (i.e. TileDB-Java
's support for unsigned values is limited by java's lack of support for unsigned integers), which is what I believe @eddelbuettel was referring to. However the compatibility with arrays should be guaranteed with the official releases.
Thank you for your suggestion for a compatibility matrix for the various api versions and matching core libtiledb version. We will look into this. We have a matrix in the TileDB-Go api, which we've been experimenting with. Would something like this be helpful to you in the other API repositories?
On your specific issue, we were able to reproduce your issue by mixing the current dev
TileDB with 1.7.6. When an array is created from the dev
branch, we have a updated on-disk format version which the older 1.7 branch is not able to read.
I believe what might have happened, is when you checked out TileDB
core c++ library, it defaults to dev
branch. If you built dev
before switching to the 1.7.6
tag (or release-1.7.6
branch), then when you eventually switched branches if you did not wipe the build directory, then you might have ended up in a non-clean 1.7.6, resulting in dev
arrays being produced in C++ and by the R api.
TileDB-Py's pip
packages include pre-built libtiledb.so
/libtiledb.dylib
in the wheel. The latest version of TileDB-Py 0.5.8 targets libtiledb 1.7.6, so this is why the python package was restricted to 1.7 or older arrays.
Building TileDB-Py from source allows TileDB-Py to check for a local install of libtiledb.so
. If it finds tiledb on the system, then it will build and link against this version. This means on your system now C++, R and Python are now all using the same libtiledb.so
, thus your version mismatch has disappeared.
Hi,
I'm using release-1.7.6 of TileDB-core, and have been receiving the following:
Error in libtiledb_array_open(ctx@ptr, uri, query_type) : [TileDB::Filter] Error: Deserialization error; unexpected metadata length
Creating schemas etc. seem to go ok (no errors are thrown, and the schema looks ok in a debugger), and using the created domain from within C++ seems to go ok (i.e., I can read in / write out data from TileDB into temp arrays). However, in trying to see and work with the same array from Python & R I get the above error. Both Python and R are 'latest'; I've also tried R from within your Docker instance.
Thanks in advance for any insight. Steve