FrancisGregoire / parSentExtract

A BiRNN framework implemented in Python and TensorFlow to extract parallel sentences from aligned comparable corpora.
MIT License
33 stars 11 forks source link

problem in executing #5

Closed imrrahul closed 6 years ago

imrrahul commented 6 years ago

Traceback (most recent call last): File "eval.py", line 155, in tf.app.run() File "/home/rahul/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "eval.py", line 114, in main sourcevocab, = utils.initialize_vocabulary(FLAGS.source_vocab_path) File "/home/rahul/Desktop/pse/pse-master/utils.py", line 190, in initialize_vocabulary raise ValueError("Vocabulary file {} not found.".format(vocabulary_path)) ValueError: Vocabulary file ../data/vocabulary.source not found.


Firstly I ran the testing command

python train.py --source_train_path ../data/train.en --target_train_path ../data/train.fr --source_valid_path ../data/valid.en --target_valid_path ../data/valid.fr --checkpoint_dir ../tflogs

it ran and no output came just exit the prompt within 2 seconds. and when I ran the command -

python eval.py --checkpoint_dir ../tflogs --source_test_path ../data/test.en --target_test_path ../data/test.fr --reference_test_path ../data/test.ref --source_vocab_path ../data/vocabulary.source --target_vocab_path ../data/vocabulary.target

then that above error message came .please help

FrancisGregoire commented 6 years ago

You failed to run train.py so you didn't create the vocabulary files (vocabulary.source and vocabulary.target). Adjust your training and validations paths accordingly, i.e. ../data/train.en

imrrahul commented 6 years ago

hello, I am not able to do the step - Prepare the training data please help me in that what should be the location of files please help sir screenshot from 2018-06-02 12-49-10

FrancisGregoire commented 6 years ago

You didn't preprocess your data. First, you need to clone Moses https://github.com/moses-smt/mosesdecoder. Then, from the parSentExtract directory run the command ./scripts/preprocessing.sh with the following arguments: 1. path to mosesdecoder (example: ~/moses/mosesdecoder, 2. path to your training data without the filename extensions (example: ./data/train) 3. source language filename extension (example: en (since my filename is train.en)), 4. target language filename extension (example: fr (since my filename is train.fr)), minimum sentence length (example: 3), maximum sentence length (example: 80).

With train.en and train.fr as training files in ./data and Moses in my home directory, the preprocessing command would be: ./scripts/preprocessing.sh ~/moses/mosesdecoder ./data/train en fr 3 80

Hope it helps.