Add "export unknown terms" (or "export all terms and statuses") action to Book actions

jzohrab commented 6 months ago

~~Blocked by #316~~ - this is done now

The parent mapping export used to have a thing to export all unknown terms. That could be useful for loading up vocab lists for books.

The code has some TODO issue_336_export_unknown_book_terms markers for things that should be used for this.

add action
add unit test (or restore from existing) -- note that books now add status 0 terms while reading, have to handle those.
add integration test (or restore from existing)

UPDATE: Lute has a CLI job to export book terms -- see the comment below for notes about what's needed to make this a book action callable from the UI.

As part of this work, any code with TODO issue_336_export_unknown_book_terms should be removed, as I don't think it's used anymore.

jzohrab commented 6 months ago

No longer blocked.

jzohrab commented 3 months ago

This is slightly more complicated than the hacky code marked with the TODO, or the language_term_export.py thing.

The current hacky code doesn't include multiword terms. For languages like classical chinese, that's important.

I think that what needs to happen is an in-memory "render" of each page, something like read.service.start_reading -- but without saving all of the status 0 terms. The resulting paragraphs will contain all of the text tokens, including net new ones (not saved) and saved status 0 ones, and all the rest, of course.

The test cases for this are pretty easy, even if the code isn't:

new book = all words
new book with some known words
new book with some multi-word terms
new book with some status 0 terms
extraneous status 0 words not included
at the start and end of each test run, the number of terms saved in the db should not increase, book current text id shouldn't change

Since this is long-running, may need to have some kind of WebSockets to report back to the client.

jzohrab commented 3 months ago

Some good interim progress. Hacked at the language term export job quite a lot, and added a new book_term_export <bookid> <filename> cli job, e.g.:

flask --app lute.app_factory cli book_term_export 432 sp_terms.csv

This is a bit slower than the old job, b/c it essentially does the calculations for a full page render for each page. It feels like it should be faster, but whatever.

This can't be added to the "actions" dropdown, b/c it doesn't communicate well back to the client. The job just prints to the command line, but when clicked from the web ui the job should really communicate back via a web socket, and then download the file at the end. Since the job is slow-ish, the user should be notified what's happening.

LuteOrg / lute-v3

Add "export unknown terms" (or "export all terms and statuses") action to Book actions #336