tokenization Search Results

1000+ results
for tokenization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

UniversalDependencies/docs #1002

Tokenization of hyphenated forms in English

Looking at hyphenated compounds, there are several ways that English treebanks annotate these, sometimes inconsistently within the same treebank and across treebanks. I'm basing this on https://uni…

rhdunn updated 5 months ago
3
UniversalDependencies/docs #644

Tokenization of French (contractions)

Annotations of contractions (mainly *au*, *aux*, *du* and *des*) are not consistent among French treebanks. Whereas *au* and *aux* are easy to manage as multiword tokens ([Tokenization and Word Seg…

bguil updated 5 years ago
4
latin-language-toolkit/llt-tokenizer #34

weird tokenization of nunc

tokenize this text: "+Noster+poēta+,+nisi+cīvis+Rōmānus+esset+,+ā+populō+nunc+cīvitāte+dōnārētur+.+" (e.g. http://services.perseids.org/llt/segtok?xml=false&shifting=false&newline_boundary=1&inline=tr…

balmas updated 9 years ago
2
tomerm/MLClassification #24

Tokenization and data loading

Hi @tomerm @semion1956, As it seems, today I need to run Tokenization part on the raw data and then load the output for the models. The problem as I can see it is that we are going to run many tes…

matanzuckerman updated 5 years ago
7
shonfeder/tokenize #14

Support stream-based tokenization

For large inputs we want to be able to process one line at a time, so we don't have to read the entire thing in to memory.

shonfeder updated 5 years ago
1
facebookresearch/mega #10

Tokenization for downstream tasks

First of all thank you very much for your work. I am working on the long text classification task, and given the spectacular results of MEGA for long sequence modelling I wanted to use it for this…

danigoju updated 1 year ago
1
stanford-oval/genie-toolkit #569

Tokenization bugs with examples

"30minutes" is tokenized as "30m inutes"; "Search for comedy movies that are rated R." is tokenized as "search for comedy movies that are rated r." (no space between r and period) "4-5 rating" is t…

sileix updated 3 years ago
3
sarves/thamizhi-udp #1

multiword tokenization of stanza

Tamil tokenizer of stanza needs `mwt` model. For example, the word குதிரையும் is divided into two words: ```py >>> import stanza >>> nlp=stanza.Pipeline(lang="ta",processors="tokenize,mwt") >>> …

KoichiYasuoka updated 3 years ago
2
alpheios-project/tokenizer #30

Question on tokenization errors

@irina060981 Irina, I created few errors in the tokenization and the error messages have always line 9 in the text. Could you please explain what line 9 is? see sample of error messages ![Screen Sh…

monzug updated 3 years ago
5
SciSharp/CherubNLP #1

System.NullReferenceException during tokenization

Hi All, I am trying to get some very basic tokenization to work. I think I am not using the API properly because the method `Tokenize` is throwing System.NullReferenceException. Any suggestions? …

sdg002 updated 5 years ago
1

上一页 1...16 17 18 19 20 21 22...100 下一页

1000+ results for tokenization

1000+ results
for tokenization