Word numbering (splitting, indexing) should be stored in file

hftf commented 9 months ago

YAPP/MODAQ should output word numbers as data in the JSON/QBJ files themselves so that downstream apps can refer to the exact same word-indexes and not have to reimplement or port the same word-splitting algorithm. If word-indexes are not stored in the packets/stats files themselves, then any changes to YAPP/MODAQ's word-splitting algorithms (or upstream apps generally) would require downstream apps to maintain a database all different versions of all word-splitting algorithms over time to accurately represent buzzpoints.

Ideally, it should primarily be the responsibility of YAPP, not MODAQ, to split words, parse pronunciation guides, power marks, etc. and produce a packet data file that is the single source of truth. Having this responsibility belong to YAPP and its JSON output would also reduce the redundant computation of word-splitting, power-marking, and pronunciation-guide parsing happening on-the-fly on every moderator's computer when MODAQ loads any packet, and reduce file size compared to a solution in which the data is stored in the QBJ files, which is the same per each game on the same packet. It would also allow editors to modify the JSON file in order to fix bad or unintended parses if desired.

alopezlago commented 9 months ago

I largely agree that the packets should do most of this work, though I don't think you really reduce file size without compression (a space is exchanged for at least 3 characters (quote, comma, quote) inside each question).

YAPP didn't do this because originally it was a command line program, and later an API that MODAQ called with no arguments, which meant that the information you need to parse it properly (power markers, pronunciation guide markers) wouldn't be available. Now that it has a (basic) website, these arguments could be passed in, and such a packet could be done.

That said, we should ideally move away from YAPP and have packets directly generated by the packetizing software. It doesn't make sense to create packets in a system that can have all of this context on what's in power, what are pronunciation guides, what are answers/prompts/anti-prompts, and then lose all of it when it's written to a docx file, and then trying to get that context again when parsing it.

There's also the question of what to do to make MODAQ accept such a new format, and whether both formats should be supported or if MODAQ should just make a breaking change and only accept the new format.

The final issue is that this will take a lot of work, as will other improvements that will make MODAQ much better (e.g. splitting answers into accept/prompt/do not prompt), but these days I have a lot less time and energy to devote to this.

hftf commented 9 months ago

though I don't think you really reduce file size without compression (a space is exchanged for at least 3 characters (quote, comma, quote) inside each question).

Ah yeah, I just meant compared to a solution in which parsed packet data is duplicated in each QBJ stat file instead of once in each JSON packet file.

alopezlago commented 9 months ago

The other challenge is coming up with the file format. You can split all the words, but how do we store formatting? Where do pronunciation guides go? There are a few different approaches but I need to sit down and think about them.

alopezlago / MODAQ

Word numbering (splitting, indexing) should be stored in file #281