Predelnik / DSpellCheck

Notepad++ Spell-checking Plug-in
GNU General Public License v2.0
198 stars 33 forks source link

Optimize the list of misspelled words #114

Closed azjio closed 6 years ago

azjio commented 6 years ago

При экспорте ошибочных слов список имеет раз в 5-10 больше чем релевантных.

  1. Можно ограничить длину слова, то есть не учитывать 2-х или 3-х буквенные слова.
  2. Удалить дубликаты
  3. Сортировать список То есть если из 1000 остаётся 100 слов, то реально легче пробежаться по ошибкам.

    When exporting erroneous words, the list has 5-10 times more than relevant ones.

  4. You can limit the length of a word, that is, do not take into account 2 or 3 letter words.
  5. Remove duplicates
  6. Sort the list That is, if there are 100 words out of 1000, then it is actually easier to run through errors.
Predelnik commented 6 years ago

Hmm and I thought that this functionality is hidden 😅 I can definitely do 2,3, they make sense . As for 1 I think it'd be strange without some option to tune it but I don't know where to put that option currently. However since it's hidden functionality anyway I can make it option which you need to write directly to .ini file if that's fine with you.

azjio commented 6 years ago

I have 1000 pages of content. I fill and correct the texts. From time to time I perform a spelling check by exporting Russian words to a file. I open the file and get a list of misspelled words. I clean it with scripts, as I said. Then I use the search and replace functions in the files to fix it. Checking spelling directly while typing is distracting. So I do this once a month.

DmFedorov commented 6 years ago

As I understand it, you have accepted sorting and deleting duplicates in list of misspelled words, And have amended the code. Selecting the desired from the list (using bookmarks) I can create a list to add to UserDic.dic


Two disadvantages remain: 1) I don't have a keyboard shortcut to copy the error List to the Clipboard. 2) I do not have the ability to instantly (on the fly) apply the modified file Userdic.dic.


Think about the possibility of adding to the file dspellcheck.ini some undocumented item or items that will allow me to get rid of the inconveniences number 1 and 2.

Now to rid first disadvantage I must call settings dialog, make a right click, make another click to select in context menu item and last click to close the window. Total 4 clicks instead of one. And to rid second disadvantage I must close Np++ and open it again.


If you agree with this too, then it is desirable to have a beta version to test the work of the modified code.

Predelnik commented 6 years ago

@DmFedorov The first issue is easily solveable just by moving this function out of obscurity to main plugin menu. The second issue could be solved probably with "secret" action to reload user dictionaries without saving them or action to add to dictionary all words from clipboard. I hope to make beta version in near future.

DmFedorov commented 6 years ago

=====The first issue is easily solveable just by moving this function out of obscurity to main plugin menu. Awesome! I think that in this case I can assign a keyboard shortcut for this action.

=====The second issue could be solved probably with "secret" action to reload user dictionaries without saving them I'm not fully understood here. I think my actions will be as follows: Get list of misspelled words from the Clipboard. Choose from it what I need and put in UserDic.dic. Save UserDic.dic. Click on "Secret" action and after it the changes that i have did in UserDic.dic will be works. (I think that you will decipher later what "Secret" action means.)

(action to add to dictionary all words from clipboard - is not suitable: I must first choose from it what I need and put in UserDic.dic. In this case, most likely I will write a comment to this list, or make previous list full commented. As a result, "Secret" action will be much more productive.)

azjio commented 6 years ago

Here in detail, if it's interesting to read http://forum.ru-board.com/topic.cgi?forum=5&topic=48204&start=1180#lt

Predelnik commented 6 years ago

Here you can try alpha version of 1.4.0 with these features implemented https://github.com/Predelnik/DSpellCheck/releases/tag/v1.4.0-alpha1 Writing Word_Minimum_Length=n in %appdata%\config\DSpellCheck.ini should ignore words with length n or less Copying All Misspellings/Reload Hunspell Dictionaries should be available though menu and shortcut mapper.

But also disclaimer: Since it's alpha version and especially because it went through some several major refactorings, even basic functions which I have overlooked possibly became broken. Please report any problems if you stumble upon them.

DmFedorov commented 6 years ago

I checked the version 1.4.0 alpha

"Multiple Languages" does not work. If you do not use "Multiple Languages", then UserDic.dic does not work.

Of course in this case it is difficult to determine what is not works, and yet:

=====Sort and delete duplicates.

Sorting is incorrect (it is case sensitive) and occurs 2 times. First, a list of words with small letters, then (for example) a list of the same words with a capital letter. The correct sorting is done for example by a very old TextFx plugin It correctly sorts not only the English alphabet, but also the Russian alphabet. Npp itself sorts incorrectly (case sensitive).

=====Word_Minimum_Length = n Works only after reloading Npp. This is basically normal.

=====Keyboard shortcuts for commands "Copy All Misspelling to Clipboard", "Reload Hunspell Dictionaries" works in Npp.

Keyboard shortcut for the sub-menu "Additional actions" does not work, submenu is not called. I would remove this submenu and move the "Copy All Misspelling to Clipboard", "Reload Hunspell Dictionaries" commands in the group of first four commands. The submenu only complicates access to commands "Copy All Misspelling to Clipboard", "Reload Hunspell Dictionaries" if i use menu.


In addition, I don't see menu item "About". I can not understand why in settings remained context menu item "Copy All Misspelled Words in Current Document to Clipboard" This command does the same as the command "Copy All Misspelling to Clipboard" Settings/Simple: item "Suggestions Control:" is almost not visible


This does not apply to the plugin, but probably for you it is important: Npp versions 7.5.2 and 7.5.3 do not allow to view files in encoding KOI8-R, KOI8-U, Macintosh. KOI8-R oft used for dictionaries. The reason: The author Npp previously did not react to many flaws and now he is not informed even about errors.

http://rgho.st/download/7zYjnrgpX/663636ae718661cae0e8920556e83361be04f3f6/DSpellCheck_1.4.0_dll_Ru.7z

Predelnik commented 6 years ago

Thanks for report!

"Multiple Languages" does not work. If you do not use "Multiple Languages", then UserDic.dic does not work.

I will look into this.

Sort and delete duplicates

Doing case insensitive sort and removal is not hard, I will do that

Keyboard shortcut for the sub-menu "Additional actions" does not work

Well hotkeys for menus aren't very common, I implemented "Change Language" for convenience. Sadly for "Additional Actions" it appears in shortcut mapper because Notepad++ system for adding actions is not convenient. But I will rather remove it from there than implement context menu on hotkey since it's not easy.

would remove this submenu and move the "Copy All Misspelling to Clipboard", "Reload Hunspell Dictionaries" commands in the group of first four commands.

I'm against that, especially with "Reload Hunspell" because its meaning is unobvious for average user and it may result in loss of words added manually during current session. So you'd better stick to hotkeys.

In addition, I don't see menu item "About".

Yes that's due to typo and inconvenience of adding actions in N++.

I can not understand why in settings remained context menu item "Copy All Misspelled Words in Current Document to Clipboard"

Yes this will be removed.

Settings/Simple: item "Suggestions Control:" is almost not visible

I possible already fixed that in master branch.

DmFedorov commented 6 years ago

Thanks for the reply.

Predelnik commented 6 years ago

I tried to fix most of the issues mentioned https://github.com/Predelnik/DSpellCheck/releases/tag/v1.4.0-alpha2

DmFedorov commented 6 years ago

v1.4.0-alpha2 At first glance, everything works. But it does not work as expected. I can not edit the UserDic.dic dictionary in Npp. This dictionary now can be only increased. In addition, the words in it are automatically sorted. And it was supposed that I can make comments there: This list of words is for this, this one for another.

The Reload Hunspell Dictionaries command seems to be applied but also weird. If I delete all words from UserDic.dic, than saving, and then I will do Reload Hunspell Dictionaries, then the words that are no longer in UserDic.dic will not be underlined. And after reloading Npp in UserDic.dic i will see all old words and added words with sorting.

Predelnik commented 6 years ago

Ok I probably understood your second paragraph and that's really an error. I have list of words I use for saving to UserDic.dic and I don't reset it on Reload Hunspell... so they always end up being saved.

As for the first I'm not entirely sure. Weren't words always sorted in UserDic.dic? I also don't entirely get how exactly you write comments in it.

Btw I don't know if it affected your workflow but I made the number at the beginning of the dictionary ignored during read and all subsequent lines treated as a word. The correct number will be written during writing dictionary back though.

DmFedorov commented 6 years ago

Weren't words always sorted in UserDic.dic?

I'm sorry, but I really did not check it. I initially thought that the words in the UserDic.dic dictionary should not be sorted exactly for the same reason that they are not sorted in any other real dictionary with the dic extension.

Words in UserDic.dic should not be sorted because with this dictionary people can really work. The remaining dictionaries have an additional affix-file with pretty complicated rules. In UserDic.dic file, these rules are simple, but effective.

If words are not sorted, then I know (for example) which word I added last. It is important. In addition, I can (now can) add a list of words. In front of this list I can write as a comment why I added this list.

If necessary, I can comment out all the words in this list and the list will not be taken into account. It's comfortable.

But maybe I'm wrong? I proceed from the experience of work with usual .dic file. (I really worked with them a lot). I know that comments are acceptable. UserDic.dic I did not thoroughly check. But in description it was said that it differs only that it does not have .aff-file, that I must describe all the word forms separately for each word.

..but I made the number at the beginning of the dictionary

As for the automatic counting of words in UserDic.dic - It is certainly good for the program, although from experience of working with .dic-dictionaries I know that this number can be so to say approximate.

azjio commented 6 years ago

How long does it take to sort the dictionary? If you sort at each start, then in the end a long run of the program. Make an add-on for each original dictionary? "Added_ru_RU.dic"? Or still have to sort them in total?

DmFedorov commented 6 years ago

Azjio. Now for some strange reason plugin sorts a small Unified User Dictionary (UserDic.dic), and (thanks God) big dictionaries (such as ru_RU.dic) is not sorted. The time that is spent for sorting is (of course) small, but the sorting itself makes it impossible to manage the dictionary. Everything will be mixed. And it is on this inconvenience I try to sharpen attention.

azjio commented 6 years ago

Using AutoIt, I sorted the ru_RU.dic file in reverse order in 10 seconds.

Predelnik commented 6 years ago

After putting some thought into this I think I can make this work if I use Hunspell "load additional dictionary" function which I was neglecting. I can just append words added by "Add to User Dictionary" to the end, update word number, the rest would be preserved. Additionaly you will be able to add comments and use whichever other syntax .dic files have in it. So I'll look into it.

DmFedorov commented 6 years ago

@Predelnik Thank you. That is what is required. Do not mix words in dictionaries. When real misspellings appear in User dictionary, I can transfer them to the real dictionary, and pseudo-misspellings (abbreviations and other debris from technical texts) let remain in UserDic.dic.

Predelnik commented 6 years ago

I've tried to implement that logic in new alpha version. https://github.com/Predelnik/DSpellCheck/releases/tag/v1.4.0-alpha3 The only problem I've encountered is that since I store UserDic.dic as UTF-8 I have to convert it to dictionary encoding if it's not UTF-8 and do so using temporary file, hopefully it will not be slow on large dictionaries (but technically is was already done word for word beforehand so shouldn't be a big deal).

DmFedorov commented 6 years ago

I have not tested the plugin yet. Found a bug described it here https://github.com/Predelnik/DSpellCheck/issues/116

Predelnik commented 6 years ago

Looks like everything from this discussion was implemented in 1.4.0 which is now available through plugin manager. I'm closing it, feel free to reopen if I've missed something.