tokenize Search Results

1000+ results
for tokenize

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

shibing624/ChatPDF #30

context len 同时控制了 tokenize之前的string 的len 和 tokenize之后的token…

如标题所说, context_len参数同时控制了stream_generate_answer函数中token的len: @torch.inference_mode() def stream_generate_answer( self, max_new_tokens=512, temperatur…

rjagge updated 5 months ago
3
kyegomez/PALM-E #15

transfer for caption

Thanks for your work!! As shown in example.py, caption is under tensor format. So, do I need to create my own transformer-like model to transform a text format caption into a tensor format? ## Upvo…

Vincentlu1412 updated 1 month ago
2
gama-platform/gama.experimental #33

[irit.gama.switchproject] Upgrade from StringUtils.tokenize

Hi @benoitgaudou ! :wave: Since https://github.com/gama-platform/gama/commit/7f328e8d1183cfa9098187dd9a1bc1090ee1b011 @AlexisDrogoul removed from StringUtils the `tokenize` function. Therefore…

RoiArthurB updated 1 year ago
1
tadashi-aikawa/obsidian-various-complements-plugin #317

How to auto-complete words with aphostrophes?

I write a lot with apostrophes and I can't figure out how to tokenize words like "don't". I tried the following: doesn\'t appears as doesn\ doesn\t appears as doesn\t "doesn't" appears as …

VeeVis updated 4 weeks ago
1
nltk/nltk #1995

word_tokenize keeps the opening single quotes and doesn't pa…

`word_tokenize` keeps the opening single quotes and doesn't pad it with space, this is to make sure that the clitics get tokenized as `'ll`, `'ve', etc. The original treebank tokenizer has the sam…

alvations updated 3 weeks ago
8
YuanGongND/ltu #13

Missing Tokenize Audio Info during Fine-tuning/Training

It seems missing the tokenize the audio (from 'input_ids') step both in finetune.py/finetune_low_resource.py of the LTU repo. Where is the detailed coding step for audio tokenization? I saw the 'load_…

dingdongwang updated 9 months ago
1
edgi-govdata-archiving/web-monitoring-diff #6

HTML diff should tokenize on some punctuation

This FTP diffing problem made me realize we should probably be splitting tokens in the HTML diff on periods (and maybe other punctuation?), not just on whitespace: ![screen shot 2018-11-21 at 9 04 …

Mr0grog updated 2 years ago
3
CommunityToolkit/dotnet #807

Support tokenize span using more than 1 separator

### Overview Sometimes a [ReadOnly]Span needs to be tokenized using more than one separator. ### API breakdown ```csharp namespace CommunityToolkit.HighPerformance; public static class SpanExte…

skarllot updated 7 months ago
1
zarr-developers/zarr-python #202

Add __dask_tokenize__ methods to Zarr objects

Would be good to have `__dask_tokenize__` methods added to `Array` and possibly `Group` classes. These methods are used to generate unique identifiers for different Dask objects. By default, they will…

jakirkham updated 2 years ago
12
nyu-mll/jiant #1281

tokenize_and_cache cooks up wacky paths

**Describe the bug** Supplying a relative path to the data downloader lays a trap for `tokenize_and_cache.py`. **To Reproduce** Call `jiant/scripts/download_data/runscript.py` to download some t…

eritain updated 3 years ago
3

上一页 1...14 15 16 17 18 19 20...100 下一页

1000+ results for tokenize

1000+ results
for tokenize