Open M-Biggles opened 7 months ago
This might be tough to manage, processing-wise.
Do you feel it adds value to you as a learner, or would it become "yet another distraction", in a way? (E.g., when I was a kid reading a difficult book, I didn't know that it had 9700 unique words, I just knew it was a big book :-P ).
(Recognizing that adult learner needs are different than a kid's reading needs, but I still like that mindset when considering or designing things) Cheers!
Definitely adds value. I'll say that the unique word-count is more useful in working out the difficulty level of a text than the total number. Easier texts tend to have less unique words, and the same goes for difficult ones (which is why we restrict word count when creating graded materials). It really helps many to select which text to work on next when they can't pick and are trying to set up learning goals (Read some A1-A2 materials, later read the B1-B2 stuff, etc).
Processing-wise, allow it to be turned on or off in the options (hiding it will not switch off that processing, though, so perhaps the option is the better path).
"Allow Processing of Unique Words" / "Display number of Unique Words"
An additional note: with the number of unique words combined with a word frequency list, it would be possible to work on a tool to automatic tag texts for level, such as*:
A1 = 0-600 words A2 = 601 - 1,200 words B1 = 1,201 - 2,500 words B2 = 2,501 - 5,000 words C1 = 5,001 - 10,000 C2 = 10,001 - 20,000
Numbers from here, but we could look into it more would vary per language
That's another feature for another day, but it would be made more possible by a calculation of unique words.
I'd leave the tagging to the users, this could get error-prone with different languages. But someone else could implement it if they'd like :-)
Yeah, I wouldn't implement without having some decent data on the levels. HSK numbers and such.
Maybe it could be a per-language thing, only being operative for preloaded languages where we have good counts for the levels and allowing users to set their own.
Holding off until #250 is done.
Now that #250 is done I was looking into this.
I have a "sampled text unique words" count, but it's only for the same sampled text used to calculate the book stats, i.e., it's only for the next 5 pages. I can add that quickly.
Adding a full unique word count for the whole book will be tougher as it requires a full book parse/fake render.
All sounds good to me.
A thought: would it be possible to store the unique word count of the whole-book parsing as a value somewhere? Do a full-parse at book creation and don't reparse except in case of edits, since that's the only way the value would change. With a little toggle to enable or disable full-parse at book loading.
Yes, it's def possible, is just a more involved request than adding the sampled uniques count. :-)
The full parse is needed at book load anyway to do pagination, so it's more a question about where to calc and store the value, and when to update it.
Yes, it's def possible, is just a more involved request than adding the sampled uniques count. :-)
The full parse is needed at book load anyway to do pagination, so it's more a question about where to calc and store the value, and when to update it.
Yeah, it's a further step beyond the 5-page sample, which is a great feature itself, but I figured it would be good to do at some point. Grabbing it from the initial parse sounds right.
As the title says, it would be good to have the number of unique words alongside the total words on the home page under Word Count.