Closed liushuyu closed 4 years ago
Sadly, I actually had to remove Chinese & Japanese support in the next
branch for the search index generation as it was causing the binary size to inflate a lot (or not build at all due to asking too much RAM).
I'm happy to turn them back on if we find a way to not end up with a 80mb+ binary size though.
Sadly, I actually had to remove Chinese & Japanese support in the
next
branch for the search index generation as it was causing the binary size to inflate a lot (or not build at all due to asking too much RAM). I'm happy to turn them back on if we find a way to not end up with an 80mb+ binary size though.
Sorry for the late reply, I have done some testing just now. It seems like the crate lindera
used by elasticlunr-rs
is using an embedded ~70 MB data file for some reason (I guess it's either a trained data or vocabulary), and the Chinese segmentation library Jieja
is only taking up ~5 MB in release build.
I think in this case, you can gate them behind a feature switch so that if someone want to use the search indexing function for those languages, they can easily build a version with these supports enabled.
I think in this case, you can gate them behind a feature switch so that if someone want to use the search indexing function for those languages, they can easily build a version with these supports enabled.
Can you do a PR by any chance? I'll take one
Bug Report
Environment
Zola version: 0.11.0 (from
crates.io
)Expected Behavior
The search index builds correctly with no error since the underlying crate (
elasticlunr-rs
) supports Chinese through the de-factoJieba
segmentation library.Current Behavior
It does not work when the website has multiple Chinese languages (Simplified and Traditional) and fails with
The root cause is that upstream implementation recognizes the Chinese as a whole (and their implementation works for common variants of Chinese) and the only language code it accepts is
zh
.Step to reproduce
Put the following into
config.toml
:Extra notes
I didn't open a PR since this may need some discussion as I can see Zola thrives for cleaner implementations for everything.
My suggested solution would be having extra handling in https://github.com/getzola/zola/blob/97e772868d8892874cfb825c1acd789d9ad725f3/components/search/src/lib.rs#L32-L48.
The extra handling could be just simply stripping out the variant suffix like
zh-cn => zh
.If you think the extra string manipulation would hurt the performance, then it could be gated behind a feature switch; and if you think this should be better resolved on the upstream, then I can open an issue on the upstream repository as well.