Floating point exception

sanjaymeena commented 7 years ago

Docker image : docker.paddlepaddle.org/book:0.10.0rc3

I am training my own Semantic role labeler model using modified 06.semantic role code. My training data size is 24000 sentences. If i train using sample data size ~1000 sentences, i can successfully train a model without errors and save all parameters properly. However for 25k sentences i get this following error.

I have been stuck here since many days now : ( . I verified that this may not be due to wrong data as all sentences have been iterated over in previous passes.

lcy-seso commented 7 years ago

I can reproduce this error. I am checking it now and will provide the solution as soon as possible.

lcy-seso commented 7 years ago

From the last two lines of the log file, the cost is increased from about 590 to 2974, I think you encounter the gradient explosion problem. One possible solution is to use gradient clipping, but I am really sorry, currently, PaddlePaddle has a problem with it (it may not be activated even if you specify the clipping threshold), we are working on it.

Gradient explosion can be avoided by carefully tuning the training parameters, maybe it is affected by the weight initialization, the learning algorithm, the learning rate, the length of the training sample, batch size and so on. All these parameters can be tried.

sanjaymeena commented 7 years ago

Hi Icy-seso ,

i had tried changing batch size, amount of training data . Still had gradient explosion. With smaller batch size it takes more passes but still happens .
other parameters are default ones which came in demo train.py.
I will try changing other params.

Gradient clipping seems like an important problem to resolve. Thanks !

lcy-seso commented 7 years ago

Thank you and very sorry for the inconvenience, we will fix this as soon as possible. Maybe you can try to reduce the learning rate, or change to another optimization algorithm, that is what I always do.

sanjaymeena commented 7 years ago

Thanks Icy-seso, i reduced the learning rate to 1e-2 . what other optimization algorithm would you suggest?

lcy-seso commented 7 years ago

Roughly speaking, I like to use Adam and Adadelta when training a complicated RNN model.
In PaddlePaddle, the gradient is not divided by batch size, perhaps, 1e-2 is a large learning rate.
If you use momentum, and the learning rate is 1e-2, it is recommended to explicitly divide the batch size like this:
```
learning_rate = 1e-2 / batch_size
```

lcy-seso commented 7 years ago

I will check whether gradient clipping is problematic in PaddlePaddle and give you a reply. Thanks very much for your issue.

sanjaymeena commented 7 years ago

Thanks, I am trying now with momentum and learning rate as you mentioned.

sanjaymeena commented 7 years ago

Hi lcy-seso , Is there a way to get precision, recall, f1 from paddle.v2 .trainer.test or is it possible to get using some another function?

PS : I changed the learning parameter and training so far has seems to be going on. Thanks.

lcy-seso commented 7 years ago

You can use the chunk_evalutor to get the precision and recall. But I am sorry to find that evaluators are not included in our doc and I create the issue https://github.com/PaddlePaddle/Paddle/issues/2009, we are fixing it.

Maybe you can try to read the explanation of the source codes. I will also try it and give an example later.

Thanks for your question.

sanjaymeena commented 7 years ago

Hi Icy-Seso, one another question. I am trying to run inference on about 1000 sentences in using paddle.infer.

for element in test_datax:
        #print element
        probs = paddle.infer(
                output_layer=predict,
                parameters=parameters,
                input=element,
                field='id')
        print probs
        #assert len(probs) == len(element[0][0])
        labels_reverse = {}
        for (k, v) in label_dict.items():
            labels_reverse[v] = k
        pre_lab = [labels_reverse[i] for i in probs]

I can get the results . However inference seems very slow. I have following questions :

Is there some tuning required from my side?
How to do inference on batch data?

For reference, the word embedding i use has dimension 150 compared to 32 as mentioned in the demo semantic role labeling program.

sanjaymeena commented 7 years ago

I have already used the shared link and done inference before. However, it works for only one input element . For more than one i am looping through all the elements and the performance is very slow . I am trying (~3000) sentences . Regards, Sanjay

On May 9, 2017, at 11:43 AM, Cao Ying notifications@github.com wrote:

As you mentioned the SRL demo, are you running a sequence labeling task and want to get the labeling results? Maybe the labeling results is a little different in reference. Will this be helpful? https://github.com/lcy-seso/book/blob/develop/07.label_semantic_roles/train.py#L205 https://github.com/lcy-seso/book/blob/develop/07.label_semantic_roles/train.py#L205 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PaddlePaddle/Paddle/issues/1961#issuecomment-300053362, or mute the thread https://github.com/notifications/unsubscribe-auth/AE6EVYc-0i_ilHrJIiXz32rUc4QOJmXOks5r3-DkgaJpZM4NN7PA.

lcy-seso commented 7 years ago

Sorry to delete the last reply because I don't find any problem in your codes. I am asking others for help. @reyoung

sanjaymeena commented 7 years ago

Hi Cao, Also it will be very helpful if you can share an example for using the chunk evaluation to calculate precision recall in result metrics :)

lcy-seso commented 7 years ago

Sorry for the late reply. I add chunk_evaluator for SRL. https://github.com/lcy-seso/book/blob/add_chunk_evalator/07.label_semantic_roles/train.py#L140

lcy-seso commented 7 years ago

hi, @sanjaymeena I really sorry, I find chunk evaluator has bugs in V2 API. https://github.com/PaddlePaddle/Paddle/issues/2078 This will cause trouble for you...

sanjaymeena commented 7 years ago

@lcy-seso Yeah, I found that it always print 0 in the results . Thanks for proactive measure. Also on the other note, if i want to use a web server with paddle model, do i need to build docker container from scratch?

lcy-seso commented 7 years ago

There is a documentation about PaddlePaddle in docker. and we provide built docker images. You can check this page and tell us what we do not do well.

P.S. I also find the documentation of chunk evaluator is awful https://github.com/PaddlePaddle/Paddle/issues/2079.

We are now working on making a time schedule to fix bugs found in V2 API, and I promise to keep replying to this issue to tell you the progress. Thanks so much for your problems and sorry for the inconvenience.

I also create this issue https://github.com/PaddlePaddle/Paddle/issues/2080. It will be assigned to an owner to fix this bug.

sanjaymeena commented 7 years ago

thanks @lcy-seso looking forward to using paddle for more nlp related projects :)

sanjaymeena commented 7 years ago

@lcy-seso Hi, is there any update on inference speed from model? It will be great if this checklist point is in a higher priority. Thanks for your help!

lcy-seso commented 7 years ago

@sanjaymeena I will fix this.

lcy-seso commented 7 years ago

hi @sanjaymeena , sorry for the late reply. I have updated the SRL demo. You can check this.

One reason that the inferring speed is much slower than before is that if paddle.infer is called, the model is loaded again and again at every test batch.

You can initialize the model by https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L242
```
inferer = paddle.inference.Inference(
             output_layer=predict, parameters=parameters)
```
then test batch by batch https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L206
```
probs = inferer.infer(input=test_data, field='id')
```
I use model saved by the first pass and test the entire test set which contains 5267 samples. To avoid loading the parameters in every tested batch, the inferring time is reduced from 96.5s to 78.19s.

There may still exist some other things should be optimized. I will keep on profiling. Thanks very much for your problem.

sanjaymeena commented 7 years ago

@lcy-seso thank you so much!! the speed improvement for me is almost ~100 times as my data batch contains one sentence each :)

sanjaymeena commented 7 years ago

@lcy-seso is it possible to calculate precision,recall,f1 given gold_labels in IOB and predicted labels in IOB? This is after model has been created.

lcy-seso commented 7 years ago

@sanjaymeena This PR https://github.com/PaddlePaddle/Paddle/pull/2165 fix the bug of chunk evaluator, but we haven't merged it yet.

After it is merged, precision, recall and f1 can be easily printed. We will try our best to merge it as soon as possible, and gives an example on how to use it.

sanjaymeena commented 6 years ago

@lcy-seso Thank you for your help . I am also using demo/sequence_labelling project. Docker image : paddle:0.10.0

The demo program can run and save the model just following the code. However i am unable to locate how to load the saved model and use it on test data :

File : srl_linear_crf.py

inputs(word, pos, chunk, features) outputs(crf)

In other words, i am facing problem on how to do prediction using the saved model.

lcy-seso commented 6 years ago

Hi, @sanjaymeena I am not quite sure about how do you use srl_linear_crf.py. It seems that this file does not in PaddlePaddle‘s repo.

To predict by using the saved model, there are usually three steps (you can check this):

define the topology for inference, (line 233 ~ 234) https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L233
load the saved model (line 240 ~ 243) https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L240
infer a batch (line 206) https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L206

But, as I known, you have already known this script, I think I don't unstandstand your question well...

Is it a problem with the docker environment or a problem about how to use the paddle.infer interface?

lcy-seso commented 6 years ago

In the directory demo/semantic_role_labeling, there exists two versions of SRL demo written in V1 and V2 PaddlePaddle API interface. (And I think we should clean the old version. )

the v1 API version
- dataprovider.py, a data provider used in training
- db_lstm.py, define the neural network for training and inferring
- predict.py, a python script to load the saved model and then predict the test sample
- test.sh, a script to run training, becaues in V1 version, Paddle is a standalone executable, not a python module.
- train.sh, a script to run inferring, becaues in V1 version, Paddle is a standalone executable, not a python module.
the v2 API version
- api_train_v2.py, a python script, including training and inferring

sanjaymeena commented 6 years ago

@lcy-seso I am sorry. What i meant is i am using sequence tagging : https://github.com/PaddlePaddle/Paddle/tree/develop/demo/sequence_tagging

I can use the provided linear_crf.py and train_linear.sh to train my own sequence labeler model. However i cannot find any code for prediction or loading the model :(

I am planning on using the model created by sequence_tagging for semantic role labeler project. So it was confusing in my explanation.

lcy-seso commented 6 years ago

Very sorry for the late reply. Sorry I did notice this demo before. I am checking it.

lcy-seso commented 6 years ago

P.S. some updates:

chunk evaluator has been merged, this is an example: https://github.com/PaddlePaddle/models/blob/develop/sequence_tagging_for_ner/ner.py#L178
error clipping is also fixed, my parterner write this guide. https://github.com/PaddlePaddle/Paddle/issues/2262

sanjaymeena commented 6 years ago

@lcy-seso thanks for your help :) I will try the sequence tagging for ner for my work :) It is also sequence tagging which i need for identifying predicates in a sentence.

lcy-seso commented 6 years ago

hi @sanjaymeena sorry for the inconvenience that there actually exists two versions of PaddlePaddle demos in our repo (the demo directory). I try to make some explanations.

the demo/sequence_tagging directory only provides a V1 version demo. demo/semantic_role_labeling is a SRL demo which provides both V1 and V2 version.
In the V1 version, PaddlePaddle is a standalone executable, while in V2 version, we make it a Python module.
Because, the V2 APIs is still under enhancement, currently we do not delete demos in V1 version. Sorry for the inconvenience.
Indeed, there is no testing or decoding process provided in demo/sequence_tagging directory.

To test or to decode (get the tagging result), there are two solutions (I recommend the second one):

rewrite the demo in V2 APIs as SRL demo does, and use the V2 paddle.infer interface. This causes extra works.
If you already use this demo by running sh train.sh to train the model. There is a very simple way to test which requires almost no extract work, but a simple postprocessing. You can check this: https://github.com/PaddlePaddle/Paddle/compare/develop...lcy-seso:add_infer_for_sequence_tagging

There are some explanations of the second way:

In the V1 version, PaddlePaddle is a standalone executable, you can use the following script to test, which goes as follows:
```
paddle train \
       --job=test \
       --config=rnn_crf.py \
       --parallel_nn=1 \
       --model_list="model.list" \
       --use_gpu=true \
       --predict_output_dir="./" \
       --config_args="is_predict=1" \
       --trainer_count=1 \
2>&1 | tee 'test.log'
```
- explanations for the parameters:
  1. job=test: start a testing process which only executes the forwards
  2. config: path of the network configuration file
  3. parallel_nn=1: this is optional, currently, the CRF layer is not implemented in GPU, parallel_nn means some layers can run in CUP while others run in GPU.
  4. model_list: a plain text file, in which is the path of the trained model. It may be like this: ./output/model/pass-00000
  5. use_gpu: whether to use gpu or not
  6. predict_output_dir=A: specify this parameter, the predict results will be saved to the file A/rank-00000
  7. trainer_count: using how many threads to test.

lcy-seso commented 6 years ago

Also about predicting for demo/sequence_tagging:

by calling the following function, you can print output of any layer defined in the network:
```
outputs(layer1_in_the_network, layer2_in_the_network, ...)
```
Please check this line: https://github.com/lcy-seso/Paddle/blob/add_infer_for_sequence_tagging/demo/sequence_tagging/rnn_crf.py#L105
the prediction results will be printed to the file A/rank-00000 (A is a directory specified by predict_output_dir )
A/rank-00000 looks like this:
```
10;
11;
11;
10;
11;
11;
20;
10;
20;
10;
```
There is a problem that the prediction results in A/rank-00000 are flattened. Each line of A/rank-00000 is the index of tagging label in one time step. To get the final tagging results, you still need some simple post-processing.
format of file A/rank-00000
- each line of A/rank-00000 is the outputs of all the layers specified in outputs. Multiple outputs are separated by ";". If output of a layer is a vector, each element of the vector is separated by " " (a space).
- If you put two layers in outputs, like this:
```
outputs(crf_decoding, crf_input)
```
  A/rank-00000 looks like this:
```
10;1.66071 0.441494 1.11911 -0.0893781 -0.99124 -1.61581 -1.05307 -1.56547 -1.21166 -1.29977 7.85641 4.47389 -1.51868 -1.88707 -1.66359 -1.6472 -0.0668767 -0.473394 -1.64615 -1.28447 1.11682 -0.0599329 2.5225;
```

sanjaymeena commented 6 years ago

Hi @lcy-seso thank you for the explained answer. My goal was to do real time prediction on each user input. I was interested in this as this sequence tagger demo (Chunker) uses Part of Speech (POS) as features. Thanks for your help :) I will try to use the NER Sequence Tagger for now :)

wanghaoshuang commented 6 years ago

Closing this issue due to inactivity, feel free to reopen it.

PaddlePaddle / Paddle

Floating point exception #1961