MihaiValentin / lunr-languages

A collection of languages stemmers and stopwords for Lunr Javascript library
Other
430 stars 163 forks source link

Error when using lunr.zh.js 'nodejieba.cut is not a function' #91

Open RogerBlasco opened 1 year ago

RogerBlasco commented 1 year ago

Error is: Uncaught TypeError: nodejieba.cut is not a function at lunr.zh.tokenizer (lunr.zh.js:98:1) at lunr.Builder.add (lunr.js:2479:1) at lunr.Builder. (xxx) at lunr (lunr.js:53:1) at XMLHttpRequest. (xxx)

The line in the lunr.zh.tokenizer is:

nodejieba.cut(str, true).forEach(function(seg) {
        tokens = tokens.concat(seg.split(' '))
      })

I'm afraid I'm not quite good enough at this time to dive in and resolve, but if someone could assist in reviewing or letting me know what exactly I would need to do to handle, I would much appreciate...

knubie commented 1 year ago

Are you trying to run this in a browser environment? The zh tokenizer requires node to run, because it uses C++ addons (node jieba). I opened an issue (#90) where I talk about how you can use the built-in Intl.Segmenter instead to segment Chinese (and other) languages quite easily. Here is a fork where I switched the zh module to using Intl.Segmenter.

greylantern commented 10 months ago

@knubie this is awesome and solved the issue!