YerevaNN / translit-rnn

Automatic transliteration with LSTM
92 stars 20 forks source link

Add new language #1

Open essamgoda opened 7 years ago

essamgoda commented 7 years ago

How add romanized Arabic and Arabic

TigranGalstyan commented 7 years ago

Tips are written in a readme and blogpost.

essamgoda commented 7 years ago

We have created a probabilistic mapping, so that each Armenian letter is romanized according to the given probabilities. For example, ծ is replaced by ts in 60% of cases, c in 30% of cases, and & in 10% of cases. The full set of rules are here and can be browsed here.

how i create a probabilistic mapping for romanized Arabic, i have Comparison table like this خ is kh/7'/5

TigranGalstyan commented 7 years ago

If I get everything right, You have to have this line in your transliteration.json

"خ": {"kh": 0.5, "7\'": 0.3, "5": 0.2},
             ^           ^         ^           

and probabilities written in this places. You can see it here. I have beautified json there.

essamgoda commented 7 years ago

thsnks i created transliteration.json file and train the network but it take very long time more than two hours and not stop yet

log file :

Loading Files Building Network ... Compiling Functions ... Computing Updates ... WARNING (theano.configdefaults): install mkl with conda install mkl-service: No module named mkl Training ...

Hrant-Khachatrian commented 7 years ago

@essamgoda do you use GPU for training?

essamgoda commented 7 years ago

@Hrant-Khachatrian when i use this

python -u train.py --hdim 1024 --depth 2 --batch_size 200 --seq_len 30 --language hy-AM &> log.txt

my laptop frozen when it start training in lo file

so i used this

python -u train.py --hdim 512 --depth 1 --batch_size 50 --seq_len 10 --language hy-AM &> log.txt

this work but when run test by this

python -u test.py --hdim 512 --depth 1 --model {MODEL} --language hy-AM

output is


Loading Files
Building network ...
Compiling Functions ...
Traceback (most recent call last):
  File "test.py", line 140, in <module>
    main()
  File "test.py", line 127, in main
    f = np.load(args.model)
  File "/home/essam/.local/lib/python2.7/site-packages/numpy/lib/npyio.py", line 370, in load
    fid = open(file, "rb")
IOError: [Errno 2] No such file or directory: '{MODEL}'
TigranGalstyan commented 7 years ago

@essamgoda you need to specify the model you want to test. In place of {MODEL} write the path to saved model, it should be 'languages/hy-AM/models/...'.

essamgoda commented 7 years ago

@TigranGalstyan okay thanks its work but when test on specific file

python -u test.py --hdim 512 --depth 1 --model 'languages/hy-AM/models/model.hdim512.depth1.seq_len10.bs50.epoch10.0043668122.loss1.71054670304.npz.npy' --language hy-AM --translit_path 't.txt'

show

Loading Files
Building network ...
Compiling Functions ...
Testing ...
0.0% done 
essamgoda commented 7 years ago

he log after training is last lines

skipped 0
computing validation loss...
validation loss is 2.94090302211
saving to -> languages/hy-AM/models/model.hdim256.depth2.seq_len30.bs100.epoch10.0363901019.loss2.72517724795.npz

when i test with this command

python -u test.py --hdim 256 --depth 2 --model '/media/ess
am/New Volume/translitration v1/LSTM/translit-rnn-master/languages/hy-AM/models/model.hdim256.depth2.seq_len30.bs100.epoch10.0363901019.loss2.72517724795.npz.npy' --language hy-AM

result is

Loading Files
Building network ...
Compiling Functions ...
Testing ...
Computing editdistance and writing to -> languages/hy-AM/results.model.hdim256.depth2.seq_len30.bs100.epoch10.0363901019.loss2.72517724795.npz.npy

when use

python -u test.py --hdim 256 --depth 2 --model '/media/ess
am/New Volume/translitration v1/LSTM/translit-rnn-master/languages/hy-AM/models/model.hdim256.depth2.seq_len30.bs100.epoch10.0363901019.loss2.72517724795.npz.npy' --language hy-AM --translit_path 't.txt'

result is

Loading Files
Building network ...
Compiling Functions ...
Testing ...
0.0% done

@TigranGalstyan @Hrant-Khachatrian