-
I made some changes to the model (3D convs) and trained the small one with 128 tokens on 128p 16-frame videos pre-compressed with CogvideoX's VAE and MSE loss.
Turned out better than I expected consi…
-
**Describe the bug**
yq sortKeys() may output yaml with `unknown anchor 'c' referenced`
Note that any how to questions should be posted in the discussion board and not raised as an issue.
Vers…
-
lang-fin/tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst
The "hyphen minus" is sometimes separate and other times retained in +Cmp/SplitR situations
Here are five separate instances:
(1a)
Ruo…
-
Because C modules can choose to release the GIL when they aren't using Python objects. If a CPU-heavy function is implemented in pure C, it can release the GIL using Python's C API. This allows the in…
ghost updated
7 years ago
-
The heuristic in `split_tokenised_text_into_sentences.py` is too simplistic:
- Full-stops in quoted text such as in `' Is cuid den searmanas é . ' ar sise . ` should not count as split point.
- 3570…
-
Bangle emulator 2v10.187
In the following, we introduce an error in order to see the (unexpected) text of the error message. Note that ‘°’ has printed as ‘throw’.
```
>y=(_=>'°'.length);y()
=1
…
-
Is there a way for the tokeniser to keep the stress symbols in the IPA transcription?
-
On current archlinux `xkbcommon` fails to compile.
I mainly care about my fork of it: https://github.com/ongy/haskell-xkbcommon it should be easier to compile.
The error:
```
/home/s…
-
When we specify a composed character like `ǩ` as a single arc in an `@bin "foo.hfst"`, hfst-tokenise doesn't analyse it. When it's specified as two arcs `k` and then ` ` (COMBINING CARON), hfst-tokeni…
-
What changes should be made in the config fine for chinese data and how do we generate truecase-model for chinese data ?Do we use the same method that we use for other languages or some other way?PLEA…