Open Thermospore opened 3 years ago
also, it seems to be using only 1 cpu core.
The validation process is probably not well optimized for lots of structured content. The freeze is due to the validation step, which occurs entirely before import progress begins, and this freeze is also noticeable on large plaintext dictionaries, maybe 5-30 seconds depending on size. It is effectively a JSON schema validation step on a giant input.
With regards to import speed, it would have to be a change on the Yomichan side, unless you can optimize the structure of your content (omitting divs/spans that aren't necessary).
also, it seems to be using only 1 cpu core.
This is expected.
Additional comment: importing can take significantly longer on mobile browsers, and I could easily see a single dictionary taking ~40 minutes to import if the data is massive. Part of this is due to the speed of the database operations (slow), and part of this is the validation step (CPU will in general be slower than desktop).
The validation step is generalized to use a generic JSON schema, and it is also therefore slower than a highly optimized rewrite. The tradeoff here is that while it may be slower, updating the JSON schema does not require any updates to the codebase.
One other point of consideration: in addition to JSON validation, structured content must be parsed for images during import.
You can also test the validation process outside of the browser using node and one of the dev scripts in this repository:
node dev/dictionary-validate.js path/to/dictionary.zip
For reference, validating the dictionary in #1854 took about 7 minutes.
Thanks for taking a look and thanks for the info!
During validation, is it possible to keep the tab responsive and/or give some indication that progress is being made? Otherwise users might think the import has failed/crashed
In regards to speed, for reference: for complete validation + import, it takes about 3.5min on my nice pc (Ryzen 9 5900X) and about 5min on my laptop (i5-9400H). I suppose I'm fine with that order of magnitude, especially since importing the dict is generally something you do just once
I can probably shave some of that off by doing some clean up, as you mention. I could also try rendering the divs to plain text \n
s, which would remove a large amount of structured content. Could also give splitting up the term bank a shot
Hopefully that 40 minute import was just a fluke, but I guess it's a wait and see
During validation, is it possible to keep the tab responsive and/or give some indication that progress is being made? Otherwise users might think the import has failed/crashed
I will probably multithread some parts of the import process, as that should be the easiest way to provide non-blocking progress updates without having to async
'ify the entire validation process (which would make it even more slow).
I tested on the dictionary you provided in the other issue and it only took around 5 minutes.
Just tried it out; looks quite nice. Thanks!
I'll see if that changed anything for the person it took 40 mins for
Hello, I am working on converting a dictionary to yomichan which makes heavy use of structured content. The source data for the dictionary is in HTML and uses lots of
<div>
and<sub>
tags in particular, which I have more or less maintainedWhen you try to import the dictionary, the tab freezes for a few minutes at 0% (presumably while checking the validity of the structured content?) before the import starts. The browser often gives a message saying the tab is unresponsive/crashed and asks if you want to close it, but if you keep waiting the dictionary will eventually continue importing as normal
I'm using chrome, but someone using firefox reported the dictonary was stuck importing at 0% for 40 minutes before continuing lol
As a test, I tried forcing the dictionary to plain text. That version doesn't freeze at 0% all, leading me to believe the issue is due to the structured content
Here is the dictionary
and the plaintext test version
Is there anything that can be done to improve the import speed/experience? Whether that be on my end or on yomichan's end
Thanks for taking a look!!