arianneorpilla / jidoujisho

A full-featured immersion language learning suite for mobile.
GNU General Public License v3.0
974 stars 62 forks source link

Improve Dictionary Grouping for English #390

Open ghost opened 3 months ago

ghost commented 3 months ago

First of all. Thank you for your hard work and for adding English support. Jidoujisho has been a great help to me

The problem I'm facing is that some of my dictionary definitions are being detected as separate results, which pretty much messes up my workflow since the frequencies only appear with certain dictionaries. I guess this is due to some dictionaries having the word's spelling also as the reading, whereas some have the reading field empty.

image

Yomitan has a feature for choosing the result grouping mode (Group term-reading pairs, Group related terms, and no grouping) I reckon it would be useful to add something like that for Jidoujisho, or at least to add some kind of customization or enhancement for the way the dictionary entries are displayed.

Edit: this seems to be related to #369

avc1657 commented 3 months ago

First of all. Thank you for your hard work and for adding English support. Jidoujisho has been a great help to me

The problem I'm facing is that some of my dictionary definitions are being detected as separate results, which pretty much messes up my workflow since the frequencies only appear with certain dictionaries. I guess this is due to some dictionaries having the word's spelling also as the reading, whereas some have the reading field empty.

IMG-20240710-WA0001

Yomitan has a feature for choosing the result grouping mode (Group term-reading pairs, Group related terms, and no grouping) I reckon it would be useful to add something like that for Jidoujisho, or at least to add some kind of customization or enhancement for the way the dictionary entries are displayed.

Edit: this seems to be related to #369

See if things work better at version 2.8.9. Version 2.9.0 is buggy.

ghost commented 3 months ago

They do, but 2.8.9 doesn't support Yomitan structured content, which is most of my dictionary collection. I guess I'll just have to wait

avc1657 commented 3 months ago

You say most of your English dictionaries use structured content? All my English dicts are in plain text. As for Japanese, I also have a good selection in plain text, including 3 dicts I converted from structured to plain text.

ghost commented 3 months ago

Yeah, I'm talking about that. How exactly did you convert the dictionaries from structured content to plain text, Do you mind sharing the script or whatever you used?

avc1657 commented 3 months ago

To convert them you need a script and the script can vary depending on the dictionary. I just asked chat gpt to write python scripts for me.

I basically prompted something like:

Write for me a python script that runs in ./ that modifies all .json files. The script is for converting a dictionary from structured content to plain text only.

For example [paste a block of the json showing its structured structured]:

But I want it looking like this [convert the block yourself to plain text so chat gpt knows what you're talking about then paste here]:

All blocks of the json files should look like pretty much the same. I want the script to be generic, which means I want all blocks being converted to plain text, bla bla bla.......

That's pretty much it, you just need to tell chat gpt in detail what you need. Or you can just write your own code if you want.

I'll share here my list of dicts just so you can see you can get a very decent coverage just with plain text stuff

SmartSelect_20240712-084024

ghost commented 3 months ago

You were right. ChatGPT made a script that also deletes the unnecessary entries, and I just converted a couple of dictionaries. Thank you! The import speed is also better in 2.8.9.

avc1657 commented 3 months ago

Which dictionaries did you manage to convert to plain text?

nacho00112 commented 3 months ago

could you please send the script or the complete prompt, I wasn't able to make it

avc1657 commented 3 months ago

could you please send the script or the complete prompt, I wasn't able to make it

What dictionaries are you trying to convert?

nacho00112 commented 3 months ago

https://github.com/themoeway/kaikki-to-yomitan/blob/master/downloads.md here the en-en and ja-en, the ja-en loads but not everything, with the en-en the app crashes probably because the dictionary is too big for the RAM

nacho00112 commented 3 months ago

using the 2.9.0 preview 2 solved the problem I hope I don't find the bugs you talked about

avc1657 commented 3 months ago

using the 2.9.0 preview 2 solved the problem I hope I don't find the bugs you talked about

2.9.0 preview 2 solves some problems and comes bundled with a few new ones, haha. As the name says, 2.9.0 is still under pre release state, so it is expected to contain bugs.