LuteOrg / lute-v3

LUTE = Learning Using Texts: learn languages through reading. Python/Flask.
MIT License
406 stars 45 forks source link

Add "export unknown terms" (or "export all terms and statuses") action to Book actions #336

Open jzohrab opened 6 months ago

jzohrab commented 6 months ago

Blocked by #316 - this is done now

The parent mapping export used to have a thing to export all unknown terms. That could be useful for loading up vocab lists for books.

The code has some TODO issue_336_export_unknown_book_terms markers for things that should be used for this.

UPDATE: Lute has a CLI job to export book terms -- see the comment below for notes about what's needed to make this a book action callable from the UI.

As part of this work, any code with TODO issue_336_export_unknown_book_terms should be removed, as I don't think it's used anymore.

jzohrab commented 6 months ago

No longer blocked.

jzohrab commented 3 months ago

This is slightly more complicated than the hacky code marked with the TODO, or the language_term_export.py thing.

The current hacky code doesn't include multiword terms. For languages like classical chinese, that's important.

I think that what needs to happen is an in-memory "render" of each page, something like read.service.start_reading -- but without saving all of the status 0 terms. The resulting paragraphs will contain all of the text tokens, including net new ones (not saved) and saved status 0 ones, and all the rest, of course.

The test cases for this are pretty easy, even if the code isn't:

Since this is long-running, may need to have some kind of WebSockets to report back to the client.

jzohrab commented 3 months ago

Some good interim progress. Hacked at the language term export job quite a lot, and added a new book_term_export <bookid> <filename> cli job, e.g.:

flask --app lute.app_factory cli book_term_export 432 sp_terms.csv

This is a bit slower than the old job, b/c it essentially does the calculations for a full page render for each page. It feels like it should be faster, but whatever.

This can't be added to the "actions" dropdown, b/c it doesn't communicate well back to the client. The job just prints to the command line, but when clicked from the web ui the job should really communicate back via a web socket, and then download the file at the end. Since the job is slow-ish, the user should be notified what's happening.