PetrochukM / PyTorch-NLP

Basic Utilities for PyTorch Natural Language Processing (NLP)
https://pytorchnlp.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.21k stars 258 forks source link

Add BPE encoder #100

Open Columbine21 opened 4 years ago

Columbine21 commented 4 years ago

add the bytepair encoding #7

codecov-commenter commented 4 years ago

Codecov Report

Merging #100 into master will decrease coverage by 0.10%. The diff coverage is 92.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #100      +/-   ##
==========================================
- Coverage   94.41%   94.31%   -0.11%     
==========================================
  Files          64       66       +2     
  Lines        1611     1705      +94     
==========================================
+ Hits         1521     1608      +87     
- Misses         90       97       +7     
Impacted Files Coverage Δ
torchnlp/encoders/text/bpe_text_tokenizer.py 90.16% <90.16%> (ø)
torchnlp/encoders/text/bytepair_encoder.py 96.87% <96.87%> (ø)
torchnlp/encoders/text/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cde86ba...63460d0. Read the comment docs.

PetrochukM commented 4 years ago

Hey! Thank you for your contribution.

Do you an opinion on subword_nmt vs tokenizers by HuggingFace?