tokenise Search Results

1000+ results
for tokenise

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

bytedance/1d-tokenizer #37

Experiments with video tokenization.

I made some changes to the model (3D convs) and trained the small one with 128 tokens on 128p 16-frame videos pre-compressed with CogvideoX's VAE and MSE loss. Turned out better than I expected consi…

NilanEkanayake updated 4 weeks ago
2
mikefarah/yq #943

sortKeys() needs to consider yaml anchor

**Describe the bug** yq sortKeys() may output yaml with `unknown anchor 'c' referenced` Note that any how to questions should be posted in the discussion board and not raised as an issue. Vers…

williamjoy updated 4 months ago
2
giellalt/giella-core #28

pmhfst tokeniser inconsistently tokenises hyphen minus

lang-fin/tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst The "hyphen minus" is sometimes separate and other times retained in +Cmp/SplitR situations Here are five separate instances: (1a) Ruo…

rueter updated 1 year ago
13
earwig/mwparserfromhell #181

Release the GIL while tokenising in C

Because C modules can choose to release the GIL when they aren't using Python objects. If a CPU-heavy function is implemented in pure C, it can release the GIL using Python's C API. This allows the in…

ghost updated 7 years ago
2
jbrry/Irish-BERT #45

Improve sentence splitter for tokenised text

The heuristic in `split_tokenised_text_into_sentences.py` is too simplistic: - Full-stops in quoted text such as in `' Is cuid den searmanas é . ' ar sise . ` should not count as split point. - 3570…

jowagner updated 3 years ago
6
espruino/Espruino #2089

String contents made tokenised in error messages

Bangle emulator 2v10.187 In the following, we introduce an error in order to see the (unexpected) text of the error message. Note that ‘°’ has printed as ‘throw’. ``` >y=(_=>'°'.length);y() =1 …

stephenPspackman updated 2 years ago
1
pavelsof/ipatok #6

Keep stress symbols?

Is there a way for the tokeniser to keep the stress symbols in the IPA transcription?

dreamk73 updated 2 months ago
2
malcolmwallace/cpphs #16

compile error in xkbcommon on current arch

On current archlinux `xkbcommon` fails to compile. I mainly care about my fork of it: https://github.com/ongy/haskell-xkbcommon it should be easier to compile. The error: ``` /home/s…

Ongy updated 5 years ago
2
hfst/hfst #367

Composed chars on single arcs work in hfst-lookup but not hf…

When we specify a composed character like `ǩ` as a single arc in an `@bin "foo.hfst"`, hfst-tokenise doesn't analyse it. When it's specified as two arcs `k` and then ` ` (COMBINING CARON), hfst-tokeni…

unhammer updated 3 years ago
10
ghpaetzold/questplusplus #50

Can Chinese language use tokeniser.perl for tokenising?

What changes should be made in the config fine for chinese data and how do we generate truecase-model for chinese data ?Do we use the same method that we use for other languages or some other way?PLEA…

Shireen35 updated 3 years ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for tokenise

1000+ results
for tokenise