bab2min / Kiwi

Kiwi(지능형 한국어 형태소 분석기)
https://lab.bab2min.pe.kr/kiwi
Other
396 stars 46 forks source link

Added JavaScript/wasm target via emscripten #171

Closed RicBent closed 1 week ago

RicBent commented 3 weeks ago

This pull requests adds a build target for a JavaScript library, usable in any modern Browser. For this the emscripten toolchain is utilized.

Solves #169

You can check out a running demo here: https://ricbent.github.io/Kiwi/demo/

Things that can be improved:

Because I haven't set up GitHub actions for this target yet, you need to manually build. This requires the emscripten toolchain to be installed:

mkdir build
cd build
emcmake cmake -DCMAKE_BUILD_TYPE=Release -DKIWI_USE_CPUINFO=OFF -DKIWI_BUILD_TEST=OFF -DKIWI_USE_MIMALLOC=OFF -DKIWI_BUILD_CLI=OFF -DKIWI_BUILD_EVALUATOR=OFF -DKIWI_BUILD_MODEL_BUILDER=OFF ../
make

This will generate kiwi-wasm.js and kiwi-wasm.wasm. The former can be directly used in any modern browser.

Sorry if I didn't follow some contribution guidelines correctly as I don't speak much Korean (yet).


Things changed since the creation of the PR:

RicBent commented 3 weeks ago

Is this change correct? https://github.com/RicBent/Kiwi/commit/3f0eb0cbe6167078680e0c6a2d6577c1ab8af08a

Passing 0 for numThreads to prevent the ThreadPool creation in KiwiBuilder did not work as that triggered some assertion.

RicBent commented 3 weeks ago

Attempted to implement the release workflow. Not sure how to test it properly without triggering a release.

I guess a workflow that triggers on new pull requests still needs to be added.

bab2min commented 3 weeks ago

@RicBent Wow, it's amazing! Thank you for your contribution! The demo seems to work very well. I'll review it as soon as possible.

RicBent commented 2 weeks ago

Great!

The latest commit adds a wrapper package that could be directly imported in any npm project. Along with proper types for the entirety of the so far exposed API (https://github.com/bab2min/Kiwi/blob/2161cf137b996471383e7c7b65370e9f28981f14/bindings/wasm/package/src/kiwi.ts).

I'd be happy to implement the remaining API functionality to bring it to the same level as the Python library once you had a look at the PR.

bab2min commented 2 weeks ago

@RicBent Oh, it looks good to me. 👍 I would really appreciate it if the rest of the functionality was completed as well. If it is difficult for you to implement all the functionality, I think it is okay to implement only the core, merge and release them first, and then supplement the rest later.

RicBent commented 2 weeks ago

I had a look at the Java bindings and it seems like the only missing API is the following:

KiwiBuilder:

Kiwi:

I would also like to add documentation to the bindings. However I am not able to make them in Korean, only in English. Is that a problem @bab2min ?

Oh and we would also need an npm package name as 'kiwi' is already taken. I chose 'kiwi-nlp' for now, but I can change it if you have a better suggestion.

bab2min commented 2 weeks ago

@RicBent It's okay to write the documentation in English. If you write the documentation in English, I can translate it for Korean documentation. I think kiwi-nlp is a good name for the package, showing that it is a NLP library. Thank you for your contribution!

RicBent commented 1 week ago

Alright, I am almost there then. I already wrote pretty much all the required documentation and just the 4 points from KiwiBuilder are missing to match the Java API functionality.

I also amended the release workflow to automatically build and publish the resulting package to npm:

https://github.com/bab2min/Kiwi/blob/c77e15fab7ec758d465e53ef81b96f1ae2699fce/.github/workflows/release.yml#L330-L346

The workflow requires NPM_TOKEN to be added to the repo's secrets on your end when this gets merged.

To generate an npm token you need to do the following:

  1. Register an account on npm if you didn't already
  2. Follow this to create a token (the token will need write access to the kiwi-nlp package)

To add it to the repo's secrets you can follow this: https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions

RicBent commented 1 week ago

@bab2min everything on my end should be done now:

Let me know if anything needs to be improved/changed!

bab2min commented 1 week ago

@RicBent Great!!! Thank you for your contribution. I'll add documentations for Korean and tokens for release workflow.