ArcadeData / arcadedb

ArcadeDB Multi-Model Database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis. ArcadeDB is a conceptual fork of OrientDB, the first Multi-Model DBMS. ArcadeDB supports Vector Embeddings.
https://arcadedb.com
Apache License 2.0
469 stars 57 forks source link

PostgreSQL DATETIME serialization invalid #1605

Open TimMensch opened 1 month ago

TimMensch commented 1 month ago

ArcadeDB Version:

v24.4.1 (build c714a4f5c827d742d6edd0f7788fe049bc63607b/1713579699346/main)

OS and JDK Version:

Running on Linux 5.15.146.1-microsoft-standard-WSL2 - OpenJDK 64-Bit Server VM 11.0.22 (Temurin-11.0.22+7)

Expected behavior

PostgreSQL API produces DATETIME readable by PostgreSQL clients (i.e., ISO date/datetime).

Actual behavior

Actual date format (regardless of database date format) looks like: Sun May 19 17:05:11 UTC 2024 instead of 2024-05-19T17:05:11Z.

Steps to reproduce

CREATE Vertex Type User IF NOT EXISTS;
CREATE PROPERTY CiteUser.createdAt IF NOT EXISTS DATETIME (mandatory true, readonly);
ALTER PROPERTY CiteUser.createdAt DEFAULT sysdate();

(Note the ALTER PROPERTY is still required above for the createdAt field to be auto-set; I commented on a closed bug that claimed this was fixed, but I've tested and it very much isn't fixed in the current build. That was bug number one that I found. This is pretty basic functionality--an auto "createdAt" field--and it was a surprise to find that it was flaky.)

In particular, if you're using the very standard pg Node.js library to access the PostgreSQL API, all resulting date fields will all be null, because it tries to parse the date strings as if they're ISO strings. Which is what PostgreSQL would put there.

This is two major bugs I've hit in two days of playing with ArcadeDB. It looks like an awesome project, and I love the philosophy, but two major bugs in two days that both required me to dig deeper to figure out what was happening means I'm going to be looking for a more mature alternative for now. Maybe I'll try it again in a year and see how you're all doing; I want to be working on my project, not spending all my time submitting bug reports on (or fixing) ArcadeDB.

It was already a huge red flag that you didn't have a standard Node library set up to work with ArcadeDB. I get that you're probably Java developers, but everyone knows they need to support Node and TypeScript at this point, yes?

lvca commented 1 month ago

@TimMensch I understand your frustration. Unfortunately, the Postgres wire protocol is not well documented, and our development sometimes is based on trial & error and/or checking other open-source projects to see how they make it work.

About this specific case, I added a couple of tests in our test suite and managing date and datetime types via JDBC -> Postgres works. So I guess the pg driver of Node.js is working differently. I see from the wire protocol that when I use both "date" and "timestamp" types, it's always code == 0, which means type unknown. In this case, the protocol parses the string. The string I see is in this format 2024-05-20 12:16:52.974+02.

Did you set a particular date time? Like with a command set datestyle to ISO?

TimMensch commented 1 month ago

Sorry. set datestyle to ISO is throwing an error in the GUI, at least when trying to use SQL. Is that something I need to send over the wire to the PostgreSQL API? That also throws an error. So no idea where that command should exist.

I did set arcadedb.dateTimeFormat to an ISO string, and the actual datetime I saw on the wire was the format I included above (i.e., the serialization didn't respect the arcadedb.dateTimeFormat).

What I see as the raw field info is:

image

There's a "data type ID" of 1082 for the DATETIME fields. I don't see a "timestamp" type in your docs? Just DATETIME (for the SQL API). Digging through the pg code, that equates to a parseDate(isoDate: string) function.

The raw data rows come in looking like this:

image

Those dates fail to parse as ISO dates.

The fix should be trivial: Always serialize dates to ISO dates. Clearly that's what pg expects, and therefore what every Postgres wire protocol handler should be able to interpret correctly, yes?

TimMensch commented 1 month ago

Glancing at your source, I see this:

if (queryText.startsWith("SET ")) {
        setConfiguration(queryText);
        resultSet = new IteratorResultSet(createResultSet("STATUS", "Setting ignored").iterator());
      }
// ...

From this file.

As far as I can tell, queryText is unmodified, and therefore sending "set " will get ignored because startsWith() is case sensitive?

Digging into setConfiguration(), the same is true for "datestyle" which needs to be lower-case and " TO " which needs to be capital. This is ... pretty sloppy. Sorry. Everything should have a "toLower" or "toUpper" or equivalent. Though really, a full SQL parser should be involved, for this exact reason: Parsing is hard, and it's easy to miss corner cases. Extra spaces? Line feeds? Comments? Tabs? Quotes? Trailing semicolon? Any of the above will screw it up even if you coerce the case, but someone expecting PostgreSQL syntax will expect all of that to work.

If I use the proper case for all the parts, instead of getting a thrown error, I see the "Setting Ignored" return result, which is frankly odd, but you can see it in the source above, so I know it's at least calling setConfiguration():

image

But the real problem is that it doesn't change the date string formats. Same output as above.

node-postgres is probably the most popular driver in use today for accessing PostgreSQL from all sources; at the very least it's in the top three, given Node's extremely high popularity. So when ArcadeDB claims to work with PostgreSQL wire format, but DATETIME values come back as null in Node, that makes ArcadeDB look bad. It's not even labeled as a "work in progress" feature, which implies that it's known to work in at least the most common scenarios. And no, Java doesn't even count as a common scenario here, because if you were using Java you'd probably just want to embed ArcadeDB for even higher performance.

It wouldn't hurt to have tests for the top five non-Java environments as well as your internal tests. Setting up Node and test code is positively trivial. Not sure where you're seeing that "code == 0", either, but I'm guessing the Java driver for PostgreSQL isn't exposing the dataTypeID that's actually part of the low-level PG wire protocol? I'm seeing that by putting breakpoints inside of the pg internal driver code and watching it parse the raw wire values. And ArcadeDB is sending those strings over the wire--with the proper code? I didn't dig through to figure out how you're generating the PG wire format, but presumably something knows the codes? By the way, stumbled across this:

image

Sample queries that will get you your OID documentation straight from Postgres.

Turns out that pg allows substitute parsers to be inserted, so I could use that as a workaround. But to me this is a canary in a coal mine: After only a few hours of experimentation I've hit three bugs, and I haven't even gotten to the "good stuff" yet.

Sorry for the frustration. I really wanted to use ArcadeDB. But this is a showstopper for me. If it's not as trivial to fix as I would expect, then a documentation update should be top priority, to warn people that your PostgreSQL driver is only alpha quality or experimental--otherwise we're led to believe all of ArcadeDB is only alpha quality. A database needs to be worthy of trust. You list node-postgres as a driver, implying it's supported/known to work, when it just doesn't work if you have any DATETIME fields and don't know to rewrite the parsers. 😒 Better to only show drivers that are known to work, where you have full integration tests for all of the ArcadeDB data types.