Kozea / Radicale

A simple CalDAV (calendar) and CardDAV (contact) server.
GNU General Public License v3.0
3.2k stars 422 forks source link

Large addressbook problems and Radicale #1002

Open nerdhardest opened 4 years ago

nerdhardest commented 4 years ago

[I'm using the latest "Master" version of Radicale with Linux x64, Linux x86, RPi3B+, with various clients: Android DAVx5, Linux Evolution, Linux KAddressBook, cadaver, etc.]

I have an addressbook with 6000 vcards which I'm trying to serve from a Raspberry Pi using Radicale.

The problem is that there is a timeout somewhere in Radicale that times out after exactly 5 minutes (= 300 seconds), assuming that the client hasn't already timed out (this test required recompilation of cadaver to increase its hard-coded timeout from 120 seconds to 720 seconds).

When running Radicale in a reasonably fast x64 machine, Radicale is slow, but it can still respond within the timeout. But on the RPi, which is perhaps 10x slower, the connection always times out.

Bottom line: neither the Linux "Evolution" or the "KAddressBook" (Kontact) can deal with such a slow Radicale RPI CardDAV server. (Even Android DAVx5 will fail due to Radicale dying on the server side after exactly 5 minutes.)

The underlying problem is simple: the CardDAV protocol seems to require listing the entire 6000 .vcf files in order to get their filenames and etags. This listing takes a very long time to compute with Radicale -- even after all of the caches have been built.

I've tried Radicale w/o ssl, without locking, without fsyncing, and serving from a RAM disk, but none of these affect the performance very much.

Some computation within Radicale is taking a horrendous amount of CPU time.

I've also tried pypy3 with Radicale, and pypy3 sort of works, but then dies with "too many open files" errors. But it appears that pypy3 is even slower than python3 !! (This is a separate, but important issue: how come Radicale sometimes crashes with pypy3?)

Can someone please enlighten me about why Radicale is so incredibly CPU bound -- even after spending a lot of time (and space) building its caches ??

Note that with any decent speed network, I can simply transfer all of the vCards uncompressed (only about 3MBytes) in a few seconds, so all of this nonsense to deal with vCards only one at a time is a major design flaw. Furthermore, the gzip'd version takes only about 1.5MBytes, and gzip'ing/ungzip'ing is very fast, even on a RPi. Furthermore, there are already good protocols for transferring large files to slow servers.

I realize that the CardDAV protocol is fatally flawed, as there is no way to "iterate" through the vCards one-by-one (or one-hundred-by-one-hundred), so any a priori fixed timeout is always going to fail if the vCard collection becomes large enough.

Thanks for any insights.

pbiering commented 3 months ago

is the issue still appearing with 3.1.x series?

f-roscher commented 1 month ago

I have similar behaviour right now, although while building the cache. It do get server side errors (HTTP 5xx) on both sync attempts from Clients (CardBook inside Thunderbird, DAVx5 on Android) and on web ui login. The load appears to be CPU bound, it is nearly constant 90% to 100%, 100% meaning one single thread respectively single core is saturated. While this happens the cache files appear, I watched those roughly over some minutes and the file count inside .Radicale.cache increases around 1.5 to 4 files per second. With my 6000 ics files this leads to several timeouts. With the next connection it appears to continue its work where it left off, so the cache count should reach the file count eventually. All this happened on one calendar. I can test an addressbook, too, if you wish.

After the cache has been built completely, everything starts working again and I can see the new web UI for the first time. Yay.

This is the release version v3.2.0 of Radicale.

Can I help here, is there any data you may need from me? Any tests to run, traces to send?

pbiering commented 1 month ago

Can I help here, is there any data you may need from me? Any tests to run, traces to send?

I would assume there would be more rework required around the storage+caching layer (just note that older versions were even more slow....)

Contribution is required here.