Helsinki-NLP OpusFilter issues

Helsinki-NLP / OpusFilter

OpusFilter - Parallel corpus processing toolkit

MIT License

101 stars 18 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Opusfilter fails to compress data when it is downloaded via moses

#75 thfrkielikone closed 1 month ago
3
Cache behaviour

#74 thfrkielikone closed 1 month ago
3
Make some older libraries optional

#73 svirpioj closed 2 months ago
0
Installing on fedora 40

#72 thfrkielikone closed 1 month ago
5
fix score method in SentenceEmbeddingFilter

#71 svirpioj closed 5 months ago
0
SentenceEmbeddingFilter chunksize clashes with general chunksize

#70 miau1 closed 5 months ago
1
Issue with opus-fast-mosestokenizer dep for ARM-macs

#69 rggdmonk opened 7 months ago
3
LMclassify always score 1

#68 wuyangjian closed 7 months ago
3
Add lingua-py support for language identification

#67 svirpioj closed 8 months ago
0
Add support for fastspell for language identification

#66 marco-c opened 10 months ago
0
Add lingua-py support for language identification

#65 marco-c closed 8 months ago
1
Refactor autogen code

#64 svirpioj closed 1 year ago
0
eflomal crashes during filtering

#63 yvesscherrer opened 1 year ago
1
Issue during installation

#61 evramnarouz closed 11 months ago
3
Add pyyaml to requirements

#60 yvesscherrer closed 1 year ago
1
insufficient documentation

#59 jairosg closed 1 year ago
1
Install eflomal from PyPI and use the new interface in WordAlignFilter

#58 svirpioj closed 1 year ago
0
switch to opus-fast-mosestokenizer

#57 svirpioj closed 1 year ago
0
Bump setuptools from 58.0.0 to 65.5.1

#56 dependabot[bot] closed 1 year ago
1
build documentation with sphinx

#55 svirpioj closed 1 year ago
0
migrate docs to sphinx

#54 BrightXiaoHan closed 1 year ago
2
Integration with MTData

#53 svirpioj opened 2 years ago
0
Better word alignment filter

#52 svirpioj opened 2 years ago
1
Automatic configuration generation

#51 svirpioj closed 1 year ago
0
Improve handling whitespace in Jieba and MeCab tokenization

#50 svirpioj closed 2 years ago
0
feature: add parallel decorator for functions preprocess, score, and filter

#49 BrightXiaoHan closed 2 years ago
6
fix jieba tokenize and detokenize funcs.

#48 BrightXiaoHan closed 2 years ago
2
fix: missing the checker for param

#47 BrightXiaoHan closed 2 years ago
1
Process Killed

#46 bayesrule closed 2 years ago
2
Add subword segmentation support

#45 svirpioj closed 2 years ago
0
add SentenceEmbeddingFilter and ParallelNearestNeighbors model

#44 svirpioj closed 2 years ago
0
Add support for Japanese tokenization

#43 svirpioj closed 2 years ago
0
add SimilarityFilter

#42 svirpioj closed 2 years ago
0
Debug the configuration by export filtered corpus.

#41 BrightXiaoHan closed 2 years ago
2
allow per-language parameters for length filters

#40 svirpioj closed 2 years ago
1
fix bug in classifier training and improve unit tests

#39 svirpioj closed 2 years ago
0
Specify different "unit" types in filters.

#38 BrightXiaoHan closed 2 years ago
2
Version 2.3.0 breaks train_classifier function

#37 wujameszj closed 2 years ago
1
add option to save scores in train_alignment

#36 svirpioj closed 2 years ago
0
add RepetitionFilter

#35 svirpioj closed 2 years ago
0
Is it possible to generate score file during training alignment model?

#34 BrightXiaoHan closed 2 years ago
6
Add LMClassifierFilter

#33 svirpioj closed 2 years ago
0
add MonolingualSentenceSplitter

#32 svirpioj closed 2 years ago
0
Possible bug in word_alignment accept function

#31 tomsbergmanis closed 2 years ago
5
tokenizer ignored when creating align.priors

#30 tomsbergmanis closed 2 years ago
2
Add method-specific options for LanguageIDFilter

#29 svirpioj closed 2 years ago
0
Use multicore to accelerate score, filter and tokenize processes.

#28 BrightXiaoHan closed 2 years ago
5
add jieba tokenizer for Chinese

#27 svirpioj closed 2 years ago
1
opusfilter : command not found

#26 Pkscode closed 2 years ago
2
pandas<1.0.0 not supported in opusfilter>=2.0.0

#25 svirpioj closed 2 years ago
1