Improve Card browser scrolling performance

oakkitten commented 2 years ago

I think Card browser scrolling performance is slightly suboptimal. The UI freezes a lot when you flick down or up, and when I drag the scrollbar I get about 3 FPS.

There seems to be a task that pre-renders the table when you scroll in a background thread. It only renders a little more than 2 screenfuls of rows. I tried to make that task render all rows, and here are the results:

There's 19619 cards in my collection
It took the renderer task 61 seconds to complete
During that time, GC freed 1501 MB, judging by Logcat messages.

Some of the problems with this are:

ListView
Card browser is doing one SQL request for initial search to get card ids, and then multiple requests for each row. Both card and note objects are created, and they both execute SQL in the constructor. (Maybe don't do SQL in constructor??!?!?)
Some fields, such as search field, are not cached in these objects, and so another SQL query is run each time a row comes into view.
Renderer task calls notifyDataSetChanged() after every card
I think that it's also possible to have several rendering tasks running alongside each other, not sure what issues this might cause.

I think this can be drastically improved by taking these steps:

RecyclerView
Run a single query for each search. Do not construct card and note objects, instead, only request what's needed for display, and only store what's needed for display. Use the Cursor from the one query to fetch the rows.
Render all cards in background. I think for a collection like mine the IO shouldn't take more than, say, 300ms? Of course, it's possible to have super big collection for which this might not work well. Maybe do something like this;
- Have renderer task set up in such a way that it renders cards from a specified position, and up to a specified amount, in the specified direction, e.g. “render up to 10 cards up from position 15”. When user scrolls, this task is made to render from the position of scroll to the direction of scroll, and main thread can wait() for these cards, or maybe pass the Cursor to the main thread for a bit.
- ~~Or perhaps only display up to N cards at the same time, and replace the contents when user reaches bottom.~~
- ~~Or think of something else.~~
~~Profile and optimize post-IO processing, and, if too slow, perhaps offload to another thread.~~
Edit: just use browserRowForId.

Also, currently browser is reset every time it is opened. If one day it keeps its state, it'd be useful to think about how it could be updated with new data. E.g. make a search in browser, go to reviewer, edit a card there, go back to search, card is updated with animation. Normally this is done via calculating a diff; ~~this probably wouldn't be feasible for large search results~~.

viciousAegis commented 2 years ago

In my initial plan I have included switching to RecyclerView, and I believe I would be doing that, so we can cross that out 👍🏼

dae commented 2 years ago

Some of the performance problems can likely be solved with col.newBackend.backend.browserRowForId(). It's intended to be used on demand (eg with a RecyclerView or similar)

oakkitten commented 2 years ago

col.newBackend.backend.browserRowForId()

I am assuming that it ultimately calls this. I am not sure if this will be faster at all. Briefly looking at the code, it seems that it constructs both the card and the note objects from scratch. Not only some of these might not be needed for the query, but these appear to do extra queries; besides, these queries are not optimized the way cursor IO is supposed to be. So—correct me if I'm wrong—the speed increase here will be due to running Rust instead of Java, which will pale before the IO penalty which is mostly the same. (That is, if Rust is faster here—JIT can do miracles!)

dae commented 2 years ago

Creating card/note structures in Rust is reasonably fast, and makes the code simpler and more maintainable than manually deserializing data from rows. I don't think your cursor solution will work: for one, if the user very quickly scrolls down, you'll pay the cost of walking the btree in the same way that large offsets are inadvisable. And it's a moot point anyway, since AnkiDroid is delegating to Rust for the DB access, and AFAIK that implementation fetches the entire query upfront before paginating it back to the Java layer.

oakkitten commented 2 years ago

My general idea was that the actual IO, if using a cursor or a similar thing, can be very fast. For instance, this single SQL query is enough to show questions and answers for the search, which is what AnkiDroid shows by default, and it only takes 90ms in Anki on my ancient laptop—not sure if it should be faster than my phone:

SELECT cards.id, notes.mid, cards.did, cards.ord, notes.flds, notes.flags, notes.tags 
FROM cards, notes 
WHERE cards.nid = notes.id
AND notes.flds like "%a%"

And even if it takes a lot of time to compute stuff to be displayed, it should be less of an issue, since the data is already in memory and fit for random access, and you can easily distribute computation across the CPUs.

P.S. I tried running some silly benchmarks in AnkiDroid.

First of all, running the above query on my phone takes around 430ms, which is 4.8 times slower than on my laptop. (My phone is Xiaomi Mi A2, and the laptop is X220 with i5 and Intel 530 SSD.)
For the same search, rendering questions using the above by calling col._renderQA() takes 25 and 30 seconds using legacy and modern schema, resp.
For the same search, fetching card IDs, then for each ID fetching a card and a note takes around 8 seconds.
For the same search, rendering questions using card and note objects takes 39 and 20 seconds using legacy and modern schema, resp.

I would love to benchmark col.newBackend.backend.browserRowForId() but I'm not sure how to call it.

Silly benchmark code; output:

// legacy_schema=true
#0: found 14682 questions in 8.269301397s
#1: found 14682 questions in 39.042556650s
#2: found 14682 questions in 25.410680396s
#3: found 14682 questions in 427.548116ms

// legacy_schema=false
#0: found 14682 questions in 9.970149379s
#1: found 14682 questions in 19.664074354s
#2: found 14682 questions in 30.226884574s
#3: found 14682 questions in 448.768482ms

P.P.S. In desktop Anki, this takes around 1.5s:

for (id,) in mw.col.db.all("""
    SELECT cards.id FROM cards, notes WHERE cards.nid = notes.id AND notes.flds like "%a%"
"""):
    Card(mw.col, id).note()

Not making any conclusions from this, but this is around 17 times 90 ms, which is consistent with 8s ÷ 430ms, which is 19.

dae commented 2 years ago

There are multiple problems with your benchmarks:

You're not measuring the time it takes for .query() to return. As I mentioned above, the entire query is fetched in advance.
You're optimizing for throughput instead of latency. We don't want to show all matching rows to the user as fast as possible; we only want to show the visible rows to the user as fast as possible. Fetching multiple rows in a single query has higher throughput than fetching one row at a time, but the more rows that are returned, the longer it takes. This is true to a certain extent with the "get sorted ids, then fetch a row at a time" as well, but it reduces the work done in the o(n) portion, as well as the amount of RAM required. Users with collections of 100,000 cards are not uncommon, and some users even take it to crazy extents like 1M+.
You're not doing any sorting, which is not representative of actual use.

When I tweak the code to use browserRowForId() and measure the full time of the query, the backend wins across the board, especially when the question needs to be rendered:

2022-08-12 11:09:56.662 25122-25206/com.ichi2.anki E/WeaselKt: runQuery round 2
2022-08-12 11:09:56.911 25122-25206/com.ichi2.anki E/WeaselKt: runQuery noQuestionDroid: found 20 rows in 239.160346ms
2022-08-12 11:09:57.026 25122-25206/com.ichi2.anki E/WeaselKt: runQuery noQuestionBackend: found 20 rows in 92.920019ms
2022-08-12 11:09:57.804 25122-25206/com.ichi2.anki E/WeaselKt: runQuery noQuestionDroid: found 40000 rows in 752.482667ms
2022-08-12 11:09:58.329 25122-25206/com.ichi2.anki E/WeaselKt: runQuery noQuestionBackend: found 40000 rows in 510.395245ms
2022-08-12 11:09:58.630 25122-25206/com.ichi2.anki E/WeaselKt: runQuery questionDroid: found 20 rows in 292.183416ms
2022-08-12 11:09:58.733 25122-25206/com.ichi2.anki E/WeaselKt: runQuery questionBackend: found 20 rows in 92.161677ms
2022-08-12 11:10:01.691 25122-25206/com.ichi2.anki E/WeaselKt: runQuery questionDroid: found 1000 rows in 2.943828616s
2022-08-12 11:10:03.168 25122-25206/com.ichi2.anki E/WeaselKt: runQuery questionBackend: found 40000 rows in 1.451204762s

Code:

package com.ichi2.anki
import com.google.protobuf.kotlin.toByteStringUtf8
import com.ichi2.libanki.Card
import com.ichi2.libanki.Collection
import com.ichi2.libanki.SortOrder
import com.ichi2.libanki.Utils
import timber.log.Timber
import kotlin.time.ExperimentalTime
import kotlin.time.measureTime

@Suppress("UNUSED_VARIABLE")
fun noQuestionDroid(col: Collection, limit: Int): List<String> {
    val cursor = col.db.database.query(
        """
        SELECT cards.id, notes.mid, cards.did, cards.ord, notes.flds, notes.flags, notes.tags, cards.reps 
        FROM cards, notes 
        WHERE cards.nid = notes.id
        AND notes.flds like "%a%"
        order by reps
        """
    )

    var n = 0
    return sequence {
        while (cursor.moveToNext() && n < limit) {
            val cid = cursor.getLong(0)
            val mid = cursor.getLong(1)
            val did = cursor.getLong(2)
            val ord = cursor.getInt(3)
            val fields = cursor.getString(4)
            val flags = cursor.getInt(5)
            val tags = cursor.getString(6)
            val reps = cursor.getInt(7)
            n += 1

            yield(reps.toString())
        }
    }.toList()
}

@Suppress("UNUSED_VARIABLE")
fun questionDroid(col: Collection, limit: Int): List<String> {
    val cursor = col.db.database.query(
        """
        SELECT cards.id, notes.mid, cards.did, cards.ord, notes.flds, notes.flags, notes.tags 
        FROM cards, notes 
        WHERE cards.nid = notes.id
        AND notes.flds like "%a%"
        order by sfld
        """
    )

    var n = 0
    return sequence {
        while (cursor.moveToNext() && n < limit) {
            val cid = cursor.getLong(0)
            val mid = cursor.getLong(1)
            val did = cursor.getLong(2)
            val ord = cursor.getInt(3)
            val fields = cursor.getString(4)
            val flags = cursor.getInt(5)
            val tags = cursor.getString(6)
            n += 1

            val model = col.models.get(mid)!!
            val splitFields = Utils.splitFields(fields)
            yield(col._renderQA(cid, model, did, ord, tags, splitFields, flags)["q"]!!)
        }
    }.toList()
}

@Suppress("UNUSED_VARIABLE")
fun noQuestionBackend(col: Collection, limit: Int): List<String> {
    col.backend.setActiveBrowserColumns(listOf("cardReps"))
    val ids = col.findCards("a", SortOrder.AfterSqlOrderBy("reps"))
    return ids.asSequence().take(limit).map { col.backend.browserRowForId(it).cellsList[0].text }.toList()
}

@Suppress("UNUSED_VARIABLE")
fun questionBackend(col: Collection, limit: Int): List<String> {
    col.backend.setActiveBrowserColumns(listOf("question"))
    val ids = col.findCards("a", SortOrder.AfterSqlOrderBy("sfld"))
    return ids.asSequence().take(limit).map { col.backend.browserRowForId(it).cellsList[0].text }.toList()
}

@OptIn(ExperimentalTime::class)
fun runQuery(col: Collection, name: String, builder: (col: Collection) -> List<String>) {
    System.gc()
    System.runFinalization()

    var size: Int
    val totalTime = measureTime {
        size = builder(col).size
    }
    Timber.wtf("runQuery $name: found $size rows in $totalTime")
}

fun runQueries(col: Collection) {
    // run a few times to prime DB cache
    for (i in listOf(1, 2, 3)) {
        Timber.e("runQuery round $i")
        runQuery(col, "noQuestionDroid", { noQuestionDroid(it, 20) })
        runQuery(col, "noQuestionBackend", { noQuestionBackend(it, 20) })
        runQuery(col, "noQuestionDroid", { noQuestionDroid(it, 40000) })
        runQuery(col, "noQuestionBackend", { noQuestionBackend(it, 40000) })

        runQuery(col, "questionDroid", { questionDroid(it, 20) })
        runQuery(col, "questionBackend", { questionBackend(it, 20) })
        runQuery(col, "questionDroid", { questionDroid(it, 1000) })
        runQuery(col, "questionBackend", { questionBackend(it, 40000) })
    }
}

oakkitten commented 2 years ago

Ahh this is great! It turns out that browserRowForId is so fast we can probably forgo caching entirely.

I wanted to test for thoughput because I thought that the only way to bring screen render under the fabled 16ms (or is it 8ms now?) would be to prepare all, or a lot of, rows in background. And if you prepare them in the background, it's reasonable to forgo latency to do the enitre job faster.

Turns out that for the same query I was running above, except with ORDER By notes.sfld and using browserRowForId:

Query + fetch all rows with cardReps takes ~1s
Query + fetch all rows with question takes ~1.4s
Query for the ids itself takes 140ms, and fetching 20 (additional) questions, which is about a screenful, takes ✨ 2 to 2.5ms ✨ on my Xiaomi! And on my Moto G 1st gen from 2013, 800ms and 8.5-9ms, resp.

This is so amazingly fast that it should give no frame drops even on old and slow devices.

github-actions[bot] commented 2 years ago

Hello 👋, this issue has been opened for more than 2 months with no activity on it. If the issue is still here, please keep in mind that we need community support and help to fix it! Just comment something like still searching for solutions and if you found one, please open a pull request! You have 7 days until this gets closed automatically

mikehardy commented 10 months ago

Hey there 👋 - this issue came up as something that is pending for the 2.17 release as it should fix #14353 in passing. It looks like there is functional code laying around somewhere for a couple of different approaches, but I don't see an actual PR open for this - is there a PR for this?

Note that with CardBrowser refactoring going on (👀 @david-allison ) it may be conflict-ridding and/or it may be in progress as part of the refactor already (but just not linked?)

So with the context out of the way, to get to the point: is there a current status + work effort in progress here? Just curious

Cheers :-)

david-allison commented 10 months ago

To note: I've got a patch here, to be applied after the refactorings are completed, but there's quite a lot of work:

This moves from CardCache to CardOrNoteId as the row primitive

That means that all operations in the card browser need to be able to transform the note Id to the card Id, or natively handle a NoteId

It also changes the "column" behaviour, as the set of Anki desktop columns don't match the current set of columns we provide

mikehardy commented 10 months ago

Okay, my priority in area of "things I control" then is to be a rapid reviewer on the CardBrowser refactors. Off I go to do that

ankidroid / Anki-Android

Improve Card browser scrolling performance #11889