Open KnutJaegersberg opened 5 years ago
I would like to know about this issue as well. My R package depends on MonetDBLite and it's quite unfortunate to suddenly realise that people can't install it due to MonetDBLite sudden disappearance to nowhere. Is it possible to somehow help so the solution will be up quickly?
Any updates?
@vadimnazarov I understand that the continued changes needed to track the MonetDB codebase (while keeping step with CRAN's changing requirements for checks) have led @hannesmuehleisen and team to develop a new database package at https://github.com/cwida/duckdb which promises several improvements over MonetDBLite. Hannes can no doubt provide more details but meanwhile you might want to keep an eye on duckdb
or take it for a spin! Hopefully it will be on CRAN soon.
I see, thank you for notifying! So did I understand you correctly: there will be no MonetDB for R, but MonetDB itself will live and thrive?
MonetDB itself will live on, yes. Thanks @cboettig for the explanation here.
Got it, thank you! Can't wait for the duckdb on CRAN. On the side note - will there be any workaround to use MonetDB from R? What to do if I want to connect to the existing MonetDB database, and don't use the embedded database?
I think we all need a word about this because several packages now depend on MonetDBLite and it was becoming a standard for data analysis on R (see all the examples of https://github.com/ajdamico/asdfree). In my case, I was using MonetDBLite on Python and R on a Windows platform. Also, I was using MonetDBLite not only as an embedded database but also to connect to a MonetDB Server database. So you can imagine my surprise when I updated to R 3.6 and discovered that MonetDBLite was not in CRAN anymore. Now I really don't know what features will remain in this new package and what features will be drop forever.
I too want to express some disappointment that MonetDBLite is going away. At the same time, I'm very appreciative of those who have the skills and dedication to work on open source projects like this. Not having those skills, I can only imagine the effort it takes to maintain MonetDBLite.
The duckdb project does look exciting. Will it be as fast as MonetDBLite?
MonetDBLite and dplyr have become my preferred method for working with a dataset that's ~1.7 GB in size (just over 3 million rows and 91 columns). Even when I just want to load the whole thing into memory, I've found nothing faster than MonetDBLite (this includes vroom, data.table's fread(), and the fst package).
And for query-like data manipulations, using dplyr with MonetDBLite on disk is faster for many things than using dplyr with the data in memory. I'm also a huge fan of data.table. It's just slightly faster than MonetDBLite for the things I do.
I installed duckdb this weekend and played around with it. It's great to see that the dplyr compatibility is already working. Yet it seems to be much slower. Using the same dataset, loads the data ~7x slower than MonetDBLite.
Again, very appreciative of the time and efforts!
I installed duckdb this weekend and played around with it. It's great to see that the dplyr compatibility is already working. Yet it seems to be much slower. Using the same dataset, loads the data ~7x slower than MonetDBLite.
We have not optimized the loader yet, it will happen though.
@nilescbn @Mytherin has just pushed upgrades to the CSV loader, please try again and see if the performance issue is still present.
@hannesmuehleisen, my apologies, I didn't see the notification of your last message. I only noticed today as I was browsing for updates. I tried updating duckdb earlier, using remotes with build = FALSE, but the install failed this time (on both Windows 10 and Linux Mint). I will keep trying it. To be clear, the performance issue I was having related to loading the data into R from a MonetDBLite table (i.e. using dplyr
's collect() function). I don't know if that's connected to the CSV loader or not. Either way, I'm looking forward to trying it out.
I'm curious: why can't we install MonetDBLite from Github directly? Is that version still working/stable?
@winston-p you can, of course. But without it on CRAN other users cannot publish packages to CRAN that depend on MonetDBLite. duckdb
has been working well for me on windows, mac and linux, looking forward to seeing it on CRAN.
@cboettig I see, thanks for explaining!
I was able to get duckdb installed again thanks to the CRAN-like repo: cwida/duckdb#392.
Thank you for creating that @hannesmuehleisen.
Comparing speeds again, I'm seeing duckdb close the gap some but yet MonetDBLite is still ~2x faster in completing queries. For others who have similar questions about speed differences, these two issues may be of interest:
cwida/duckdb#407
cwida/duckdb#11
MonetDBLite removed from CRAN - do some dependencies have check issues or something?