Open rotemdan opened 3 weeks ago
windows-spellchecker
, since I realized, given the work already done, it would not be difficult to cover more features in the future - in particular being able to check spelling for entire text segments, not just single words (since that's likely more efficient and better reflects the native API)node-gyp
because there's some complexity in Windows to add a particular DLL loading hook that makes it work in Electron.js and other alternative runtimes (very hard to do manually since it apparently requires precise compiler arguments to work). Once I made the switch, the same build works in Electron.js without any problems.node
addon works in multiple versions of Node.js (18
, 20
, 22
, 23
) and Electron.js (32
, 33
)vscode-spellright
testSpelling
in the Windows binding is roughly 7 to 10 times slower than hunspell-wasm
(I'll run more accurate benchmarks in the future). The approaches I see to make it faster are of course caching (for already-seen words) - which I've already implemented in the VSCode extension, or passing "batches" of words at once, separated by line-breaks, like word1 \n word2 \n word3 \n word4 \n ...
, or pass an entire segment of the text (like a paragraph) and extract the word ranges from the results (that's the way the Windows API processes it). That should reduce the overhead of the individuals calls for each word, which is now likely a bit highaudio-io
package. I'm getting the 3 addons there (Windows, macOS, Linux) working in Electron.js, which requires some modifications of the C++ code due to Electron.js (20+) N-API restrictions on allocating ArrayBuffers with external memory, which I didn't know about beforemacos-spellchecker
and the repository is here. It includes addons for both x64 and arm64. NSSpellChecker
rather than using an instance shared for the current process (which the Atom addon uses). It prevent potential collisions between different instances running on the same process - for example, each one selects a different language.arm64
addon built for the Windows spell-checker (for Windows 11 arm64 versions), which turned out not to be that difficult (required just installing arm64 build tools through the Visual Studio installer)Now that we've got the addons redone, I think the best way to use them is not actually to integrate them directly to the VSCode extension, but to write a kind of a "backend" library that acts as a kind of "spellcheck provider" or "spellcheck server" (though it wouldn't necessarily run as an actual server).
My general idea is that this provider would be a reusable npm package that would create a background worker, like a new Node.js worker thread (or WebWorker), and load the addons in the worker (and the WASM one as well) in the background (also can be potentially multithreaded and create multiple workers, but that's for the future). The backend would be fully independent from any UI, and can also be tested and used as a plain library, or from a CLI app.
The backend would provide higher-level tools, I'm imagining a kind of "diff-based" approach (inspired by React), where:
In the backend I can also perform more accurate word segmentation, like with the cldr-segmentation
library, which I heavily use in Echogarden
(a speech toolset I'm developing - which also includes several natural language processing features, as part of it). cldr-segmentation
has language-dependent datasets of abbreviations that can more accurately distinguish between .
characters used as part of word, to ones that signify sentence ending or other punctuation, as well as various numeric patterns that may include .
.
Better word segmentation can also help in reducing the number of false positives due to things like special characters, separators, etc. like is seen in the "Alice in Wonderland" example with Hunspell (it likely happens because Hunspell doesn't properly trim the non-word characters sent to it - it can be fixed by using more accurate word segmentation which would naturally "trim" the unwanted characters from the word))
The word segmentation itself can also be cached in the backend. We'll see.
I don't see the amount of work to implement this to be very large. I think actually redoing the native addons was generally more difficult.
Edit: forgot to mention. Very soon I'm publishing a new language detection library called echo-ld
(Echogarden Language Detection library). It is very accurate (for example can differentiate between different Italian dialects and even between Norwegian Nynorsk vs Bokmål - to some limited degree). I'll likely integrate that to the backend as well - it could technically be used detect spelling errors even in a document that contains different languages (by detecting the language of different segments independently and applying the correct dictionary for that language).
Earlier today I looked at the Windows code in the deprecated
atom/node-spellchecker
package that's used in the extension.Given I'm a Windows user, and I have a working example in
atom/node-spellchecker
, and some recent experience with N-API development, I thought I could write a new binding, but make it simpler and more maintainable, since the current one is using outdated and unstable APIs.It's now published as the
windows-spellchecker
npm package (MIT license). Repository is here.The core C++ addon is this single
.cpp
file.Main differences to the approach in
node-spellchecker
:napi.h
C++ API (and tried to ensure only basic ones). WithNAPI_VERSION = 8
, which is stable across Node.js versions starting at Nodev12.0.0
up to now and future (Edit: the headers currently used support Nodev18.0.0
and Upwards). It should also work across differentElectron.js
versions (not tested yet - will make any changes needed for that)testSpelling
andgetSpellingSuggestions
).addWord
andremoveWord
(Win10+) are implemented but will add and remove from the user-level system dictionary!node-gyp
,MAKE
,CMAKE
or anything of that sort. It's built using plaincl.exe
(MSVC compiler) anddlltool
(part of thebinutils
package in msys2) to producenode_api.lib
~. Edit: Although it was working fine in multiple Node.js version with the basic approach, it was very hard to get it working in Electron.js withoutnode-gyp
, so I eventually switched tonode-gyp
removeWord
, which is only supported in Windows 10 or newerI'll integrate this one as well, and test it along with the WebAssembly one (which I'm currently continuously testing).
Once it's integrated and working, it'll reduce the need to rebuild the binding over and over again.
The macOS binding can also be rewritten this way, but that's for the future (I don't have a macOS machine, so I can only test it in a VM - by repeatedly syncing source files using SSH - trying to use a development UI in the VM is a completely unusable experience).