Open 645709712 opened 4 years ago
W@@ ait for more money and then fill it up . Not sure . Un@@ comfortable . F@@ ail . Wr@@ ong push . It 's been up@@ load .
.... I don't think the word was restored after it was split, so what's wrong with that?
Simply run (s + ' ').replace('@@', '').rstrip()
on the output string s
.
I think replace('@@ ', '') is a correct way. After all ' .' at the end looks ugly
What is the conclusion here, @645709712?
Use this function . Like so :-
import subprocess
def restore_segmentation(path):
"""
Take a file segmented with BPE and restore it to its original segmentation.
"""
assert os.path.isfile(path)
restore_cmd = "sed -i -r 's/(@@ )|(@@ ?$)//g' %s"
subprocess.Popen(restore_cmd % path.relpace(' ', '\ '), shell=True).wait()
for f in os.listdir(output_path):
restore_segmentation(os.path.join(output_path, f))
I am doing th-en translation,and after running translate.py,I got some results like this: you see,many '@' in translated sentences.I think it's largely related to BPE algorithm.(in valid/test result,No '@') So,what should I do to solve or improve this problem? Thank you.