Closed david-allison closed 3 years ago
During testing, Rust threw an OOM which was not handled by catch_unwind
. This is disappointing and seems to be the default OOM behaviour for Rust. Instead of an Exception, we receive a SIGABRT
.
We don't get many OOMs, but the inability to catch them from Java is frustrating - ACRA doesn't handle them, and Google Play typically won't give appropriate detail (no logs, will be hard to get a method signature from the Java)
It also did not seem that the Rust OOM limit was far from the Java OOM limit. EDIT: We had a large VM Heap on the emulator which explains this.
All-in-all, this brings the assumption that Rust will be fine due to no heap limit into question. Additional real-world testing will need to be performed, and we may need to discuss streaming data inside rslib
(on top of over the JNI boundary - which will happen in this ticket) with dae.
If this is an issue, it will get better over time. All of the functionality which is likely to be expensive (imports, maybe syncing) can be translated to Rust, and we can use this after #1 is completed.
EDIT: Pagination (LIMIT/OFFSET
) on the Java side might also be possible - would need significant testing, but could solve this for us at a performance cost.
It is not clear to me, is there a way to know all places where such OOM is realistic? In this case whether it would be possible to just make check in those parts. For example, for sql data, limiting the size of data load at once; this is what was done for big json in collections (note type and deck)
I've added:
I'm not too happy with the LimitOffsetSQLiteCursor::getCount()
method, but this seems to avoid the problem.
That seems like quite a high allocation for ~60 bytes 5000 10. I'd be curious to know the length of the byte[] you're getting back from the backend there, as the raw data + json shouldn't be much over 3-4MB. Perhaps it will help if you wrap the bytes in an InputStream and get the array via https://stleary.github.io/JSON-java/org/json/JSONTokener.html, so you can avoid having to allocate a string just to parse the JSON.
But even having said that, it seems like there might be something else going on to get that much memory consumed. Are these tests being run in parallel, or are you running this test by itself? If the issue exists even when executing the test by itself, I wonder how much memory is being consumed by the act of adding the data. If Android offers a way to dump the current memory usage of a process and/or force a gc, it would be interesting to see how it changes over the course of that test.
Limit+offset may tank performance in cases where queries can't be serviced from an index - if the query does a full table scan or needs to sort afterward (eg 'order by' in the search queries), that cost is going to be multiplied by the number of pages you need to serve, so you may need to apply this more selectively if you go down this route.
One possible alternative would be to dump the JSON records one per line into a temp file if they exceed a certain count in the backend DB method. The frontend could then read the JSONL back in on demand as the cursor is iterated. No added overhead for smaller queries, and you'd avoid having to scan or order multiple times. It wouldn't solve the abort issue you got though. To reduce memory usage on the backend side, you could try reducing the default 40MB sqlite cache upper bound (https://github.com/david-allison-1/anki/blob/a148304fb3fac4f0042b01bd23cdbbefc3b1d151/rslib/src/storage/sqlite.rs#L45), and/or add a separate method that spools directly into a file instead of building a vec first.
re 0ee773d#diff-ae1a973a64affaa5b18ac55b7af4e7b61d2509b657274051310a3d10cbb94c58R103
That seems like quite a high allocation for ~60 bytes 5000 10. I'd be curious to know the length of the byte[] you're getting back from the backend there, as the raw data + json shouldn't be much over 3-4MB. Perhaps it will help if you wrap the bytes in an InputStream and get the array via https://stleary.github.io/JSON-java/org/json/JSONTokener.html, so you can avoid having to allocate a string just to parse the JSON.
But even having said that, it seems like there might be something else going on to get that much memory consumed. Are these tests being run in parallel, or are you running this test by itself? If the issue exists even when executing the test by itself, I wonder how much memory is being consumed by the act of adding the data. If Android offers a way to dump the current memory usage of a process and/or force a gc, it would be interesting to see how it changes over the course of that test.
Test should have been better documented. It was explicitly designed as a stress test. 50 2 (UTF-16) 5000 * 2^10 = ~500MB
Limit+offset may tank performance in cases where queries can't be serviced from an index - if the query does a full table scan or needs to sort afterward (eg 'order by' in the search queries), that cost is going to be multiplied by the number of pages you need to serve, so you may need to apply this more selectively if you go down this route.
Agreed - would want
One possible alternative would be to dump the JSON records one per line into a temp file if they exceed a certain count in the backend DB method. The frontend could then read the JSONL back in on demand as the cursor is iterated. No added overhead for smaller queries, and you'd avoid having to scan or order multiple times. It wouldn't solve the abort issue you got though. To reduce memory usage on the backend side, you could try reducing the default 40MB sqlite cache upper bound (https://github.com/david-allison-1/anki/blob/a148304fb3fac4f0042b01bd23cdbbefc3b1d151/rslib/src/storage/sqlite.rs#L45), and/or add a separate method that spools directly into a file instead of building a vec first.
Touching the disk here is probably more trouble than it's worth. I'd guesstimate that a large percentage of our issues (~20%) come from ENOSPC
and Android deleting cache files, and it feels fair to assume that a low RAM device will also be likely to be constrained by disk space.
Sample size of 1: I don't have problems with AnkiDroid crashes/space, but I typically have less free disk space than RAM (~500MB-1.5GB space, 3GB RAM).
As mentioned in #6, we'll get a lot of breathing room (~3-5x less RAM) by moving to Protobuf for SQL, and serialisation will likely get in the way
Going forward, this seems to be the confluence of two related, but separate issues:
Maybe we also want to get #6 into the V1 release?
We're probably not going to find out whether I'm worrying too much about this until we go to beta.
I'm leaning conservative with this, ideally going ahead and getting #6 in, along with streaming the protobufs. I'd rather not annoy users with this change if at all possible.
This must happen before publishing the first release - I expect this will likely crash low-memory devices for some operations.
rslib
does not stream data from SQL currently, all data is loaded into memory, encoded into JSON, and sent to the Python.https://github.com/ankitects/anki/blob/65d3a1393cbd111861221774f200f51b6ab3e89c/pylib/rsbridge/lib.rs#L81-L93
Rust can afford to avoid streaming SQL due to Rust avoiding the Java heap limit, Java doesn't have the luxury, so we should ideally stream the data.