corpus-processing Search Results

1000+ results
for corpus-processing

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

iLanguage/ilanguagelab #10

Inuktitut Corpus

``` Purpose of addition of this task: acquire Inuktitut corpus which is large enough to test multilingual open source text processing tools for Inuktitut apps When reviewing task, please focus on: s…

GoogleCodeExporter updated 9 years ago
5
blprnt/wordplay #2

Where's wordplay.js?

I forked to poke around and see how some of the natural language processing stuff was working. The API endpoints are requiring a file `wordplay.js` that doesn't exist in the repo that seems to be doin…

andyinabox updated 10 years ago
3
sgsinclair/VoyantServer #29

Better upload / processing feedback?

On large corpus, irrespective of what Voyant Server is doing, the status is 'Uploading'. Perhaps a bar graph, or status bar of the upload, and then a status change to 'processing', or similar, would …

PeterTonoli updated 6 years ago
2
RAP-group/guide_to_open_science #54

R2: preregistration, be more inclusive (again)

> - “Linguistic research is multifaceted and spans diverse areas such as corpus analysis, conversation/discourse analysis, experimental research, and more.” --> This is a very reduced list of linguis…

jvcasillas updated 2 weeks ago
1
Progressive-Learning-Platform/PLPTool6 #124

Data Preprocessing and Curation

This part involves 1. researching and extracting stop words for example ```[“document”, “story”, “machine translation”, “translation”, “figure”] -> [“machine translation”]```, performing NLP analy…

satishbhambri91 updated 6 years ago
1
huggingface/tokenizers #1546

"Solution" to memory hogging in train_new_from_iterator with…

Hi So I was training a new tokenizer from Llama Tokenizer (meta-llama/Llama-2-7b-hf), on a medium sized corpus (Fineweb-10BT sample : 15 million documents with average length of 2300 characters). A…

morphpiece updated 3 months ago
7
amirdeljouyi/UTGen #1

Understandable inputs are not kept after minimizing test sui…

Hi, I am using this fantastic tool to generate tests that are better understandable than the original EvoSuite. However, I encountered an issue when using it. I also created this issue on [UTGen/UTGen…

zzctmac updated 2 weeks ago
3
computationalstylistics/stylo #36

Performance in load.corpus

I am investigating performance problems in `load.corpus`. I think that performance could be improved significantly by replacing `scan` with another approach to loading files. This flame graph from …

adunmore updated 4 years ago
2
suriyadeepan/easy_seq2seq #6

Replies with _UNK.

A lot of the time when I talk to the model it replies with simply _UNK. Especially for quite short queries. When I train with my own larger corpus it does this a lot more than with the movie corpus al…

minimumnz updated 7 years ago
1
KBNLresearch/ochre #3

Working without aligned file

Hi I’m conducting research regarding OCR corpuses, and I would like to use this project for evaluation of how differences on the training corpus effects the quality of the post-processing. But, I ha…

omrishsu updated 6 years ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for corpus-processing

1000+ results
for corpus-processing