issues
search
Helsinki-NLP
/
OpusFilter
OpusFilter - Parallel corpus processing toolkit
MIT License
101
stars
18
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Opusfilter fails to compress data when it is downloaded via moses
#75
thfrkielikone
closed
1 month ago
3
Cache behaviour
#74
thfrkielikone
closed
1 month ago
3
Make some older libraries optional
#73
svirpioj
closed
2 months ago
0
Installing on fedora 40
#72
thfrkielikone
closed
1 month ago
5
fix score method in SentenceEmbeddingFilter
#71
svirpioj
closed
5 months ago
0
SentenceEmbeddingFilter chunksize clashes with general chunksize
#70
miau1
closed
5 months ago
1
Issue with opus-fast-mosestokenizer dep for ARM-macs
#69
rggdmonk
opened
7 months ago
3
LMclassify always score 1
#68
wuyangjian
closed
7 months ago
3
Add lingua-py support for language identification
#67
svirpioj
closed
8 months ago
0
Add support for fastspell for language identification
#66
marco-c
opened
10 months ago
0
Add lingua-py support for language identification
#65
marco-c
closed
8 months ago
1
Refactor autogen code
#64
svirpioj
closed
1 year ago
0
eflomal crashes during filtering
#63
yvesscherrer
opened
1 year ago
1
Issue during installation
#61
evramnarouz
closed
11 months ago
3
Add pyyaml to requirements
#60
yvesscherrer
closed
1 year ago
1
insufficient documentation
#59
jairosg
closed
1 year ago
1
Install eflomal from PyPI and use the new interface in WordAlignFilter
#58
svirpioj
closed
1 year ago
0
switch to opus-fast-mosestokenizer
#57
svirpioj
closed
1 year ago
0
Bump setuptools from 58.0.0 to 65.5.1
#56
dependabot[bot]
closed
1 year ago
1
build documentation with sphinx
#55
svirpioj
closed
1 year ago
0
migrate docs to sphinx
#54
BrightXiaoHan
closed
1 year ago
2
Integration with MTData
#53
svirpioj
opened
2 years ago
0
Better word alignment filter
#52
svirpioj
opened
2 years ago
1
Automatic configuration generation
#51
svirpioj
closed
1 year ago
0
Improve handling whitespace in Jieba and MeCab tokenization
#50
svirpioj
closed
2 years ago
0
feature: add parallel decorator for functions preprocess, score, and filter
#49
BrightXiaoHan
closed
2 years ago
6
fix jieba tokenize and detokenize funcs.
#48
BrightXiaoHan
closed
2 years ago
2
fix: missing the checker for param
#47
BrightXiaoHan
closed
2 years ago
1
Process Killed
#46
bayesrule
closed
2 years ago
2
Add subword segmentation support
#45
svirpioj
closed
2 years ago
0
add SentenceEmbeddingFilter and ParallelNearestNeighbors model
#44
svirpioj
closed
2 years ago
0
Add support for Japanese tokenization
#43
svirpioj
closed
2 years ago
0
add SimilarityFilter
#42
svirpioj
closed
2 years ago
0
Debug the configuration by export filtered corpus.
#41
BrightXiaoHan
closed
2 years ago
2
allow per-language parameters for length filters
#40
svirpioj
closed
2 years ago
1
fix bug in classifier training and improve unit tests
#39
svirpioj
closed
2 years ago
0
Specify different "unit" types in filters.
#38
BrightXiaoHan
closed
2 years ago
2
Version 2.3.0 breaks train_classifier function
#37
wujameszj
closed
2 years ago
1
add option to save scores in train_alignment
#36
svirpioj
closed
2 years ago
0
add RepetitionFilter
#35
svirpioj
closed
2 years ago
0
Is it possible to generate score file during training alignment model?
#34
BrightXiaoHan
closed
2 years ago
6
Add LMClassifierFilter
#33
svirpioj
closed
2 years ago
0
add MonolingualSentenceSplitter
#32
svirpioj
closed
2 years ago
0
Possible bug in word_alignment accept function
#31
tomsbergmanis
closed
2 years ago
5
tokenizer ignored when creating align.priors
#30
tomsbergmanis
closed
2 years ago
2
Add method-specific options for LanguageIDFilter
#29
svirpioj
closed
2 years ago
0
Use multicore to accelerate score, filter and tokenize processes.
#28
BrightXiaoHan
closed
2 years ago
5
add jieba tokenizer for Chinese
#27
svirpioj
closed
2 years ago
1
opusfilter : command not found
#26
Pkscode
closed
2 years ago
2
pandas<1.0.0 not supported in opusfilter>=2.0.0
#25
svirpioj
closed
2 years ago
1
Next