dbcodeio / public

The Power of Databases, The Convenience of VS Code: All in One Place
28 stars 1 forks source link

SUPPORT FOR DUCKDB #92

Closed ansuff closed 2 weeks ago

ansuff commented 3 weeks ago

Could you please provide information on your future plans to support DuckDB? We are currently working with both SQL Server and DuckDB and would greatly appreciate the addition of this feature.

mikeburgh commented 2 weeks ago

I have looked at it briefly, I recall it being very similar to SQLite, in which case it should be easy to add as a derivative…

ansuff commented 2 weeks ago

I hope it's an easy one, please keep us posted on this one, thanks a ton!

mikeburgh commented 2 weeks ago

Yeah so far, it seems to be, their docs are great, just working on the file descriptor to enable right click opening.. Image

mikeburgh commented 2 weeks ago

I just pushed a pre release version that contains support for DuckDB.

Would you mind installing it and letting me know how you go ?

Only issue I have found so far is that if you attached a SQLite DB it seems to cause the driver to crash when trying to introspect the schema, once I nail down a small working example I will raise it with the driver repo.

ansuff commented 2 weeks ago

Fantastic, I will try it out today and let you know how it goes, again big thanks.

ansuff commented 2 weeks ago

Can't seem to find the pre release version to switch to. I am trying using vscode btw, should I be doing something else?

mikeburgh commented 2 weeks ago

If you go to extensions section of vscode, find the dbcode extension in the list and click on it… in the extension detail at the top there is some buttons for uninstall", disable and there should be one for switch to pre release

ansuff commented 2 weeks ago

Yup...just found it! Testing it out at the moment, worked for all three dbs we have and will now test the full extension capabilities.

ansuff commented 2 weeks ago

@mikeburgh Everything works perfectly well and the copilot integration works as well! One point that we were wondering if the performance can be a little better or not, we mainly use duckdb cli and it's much faster specially for over 10M tables. Now we are not the experts on DB backends here in my team so any future improvements will be much appreciated. Cheers!

mikeburgh commented 2 weeks ago

Great news, I will look to move it to a full release later today.

It uses the node driver which is a light wrapper around their c binary, so there is some overhead but it should not be a lot.

Where are you seeing the performance issues, loading large result sets to display, or query processing time or somewhere else ?

ansuff commented 2 weeks ago

In both loading large results to display and during query execution, we’ve compared the performance of DBcode and the duckdb CLI. Surprisingly, dbcode takes approximately multiple times longer to execute the same queries compared to the duckdb CLI.

mikeburgh commented 2 weeks ago

Very interesting..

I know why the extension will be slower in displaying a large result set.. it has to do with the limits VSCode has on what extensions can do with the UI. Basically the results are shown in what vscode calls a webview, and to communicate between a webview and the code running in node (the extension) you need to use the browser based messaging feature. This means all data needs to be exchanged via that, or at least notified via it if stored elsewhere.. So there is a cost in terms of serializing the data from the node (extension side) to the webview side) and then re serializing it on the webview side to display it (that's before you get to then actually render it on the webview).

I have done some work to optimize this, but can probably do more.. You can see what I mean in this video, it's loading a million rows.. you can see where the timer stops, the animation halts.. the extension now has all the data and that's the serialization process using the event loop and you loose a couple of seconds for serializing both ways.

Some drivers support streaming results, which would give you progressive loading of results, I tried it but I found the overhead of sending batches of rows through the message framework slowed things down even more.

The time to run a query though that's a different story.. Can you try running the query so it does not return any results (eg a limit or wrap it in an outer CTE/Query and return only 1 row).

That way we are looking at pure execution time, no messaging/serializing.. and I can dig into what that might be that is causing it if it's still slow for that query compared to the CLI.

Given duckdb runs in process there could be some limits being imposed from the VSCode or node side in terms of the memory it can use, or similar to the sqlite driver, using async (which duckdb does) could be causing issues. You can see it in the benchmarks on this driver https://github.com/WiseLibs/better-sqlite3, avoiding async makes it a lot faster in the case of these single io based drivers. So this might be something I can work around, or something I need to raise.

I thought about using the native CLI rather than the node driver, the problem is you run into another serialization problem since you have to use std in/out to communicate with the CLI.. so you add even more serializing time.

ansuff commented 2 weeks ago

Amazing stuff! Thanks for the detailed explanation.

So I ran a couple of tests to validate what I raised. Doing a simple query with a limit of fewer than 1,000 rows seems to not have a difference between the two, but as soon as we go to the 10,000, it becomes very clear. Duckdb CLI took 0.05 seconds to get 100,000 rows while dbcode took around 2.6 seconds and the difference gets bigger the larger the selection is.

Looking forward then to how the project will be heading and big support from our team buds. Cheers!

mikeburgh commented 2 weeks ago

Appreciate you running the tests, and confirms.. it's the serialization I have to do.

I also have to do some normalization on the data so it's consistent across drivers (eg DuckDB uses big int's so need to turn those into strings since JSON cannot serialize bigints)..

I have some ideas about how to squeeze some more speed out of it..

Once they implement this it will also speed up some more: https://github.com/duckdb/duckdb-node/issues/122 since at the moment I am mapping the objects back to arrays.

Will close this for now as it's in place, and the speed up work will come as part of all the drivers.

csubhodeep commented 2 weeks ago

Hello @mikeburgh , I work with @ansuff and got to know about this amazing extension. After reading all the above my two cents, with limited knowledge in serde topic and node, if VSCode anyways wants a webview why can't we directly use DuckDB WASM, maybe it just works OOB. 😃

mikeburgh commented 1 week ago

Hi @csubhodeep I did look at the WASM driver briefly, as it makes packaging and deploying the extension a lot easier than using node binaries.

However, as far as I could see, it does not support reading local files. It can read them from urls, but not update them.