antoine77340 / Youtube-8M-WILLOW

Kaggle Youtube 8M WILLOW approach
Apache License 2.0
465 stars 165 forks source link

About the validation #4

Open junfengluo opened 7 years ago

junfengluo commented 7 years ago

Hi, can you tell me about the validation part, there is no describe about validation in your code. Can I just use the eval.py to evaluate the trained model on validation data like your method on inference.py as following? Or can you give me an example.

python eval.py --eval_data_pattern="$path_to_features/validatea*.tfrecord" --model=NetVLADModelLF --train_dir=gatedlightvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe --frame_features=True --feature_names="rgb,audio" --feature_sizes="1024,128" --batch_size=1024 --base_learning_rate=0.0002 --netvlad_cluster_size=256 --netvlad_hidden_size=1024 --moe_l2=1e-6 --iterations=300 --learning_rate_decay=0.8 --netvlad_relu=False --gating=True --moe_prob_gating=True --lightvlad=True --run_once=True --top_k=50

wincle commented 7 years ago

The eval command is almost same with the inference as I have tried it and get a return result like : INFO:tensorflow:epoch/eval number 266878 | Avg_Hit@1: 0.902 | Avg_PERR: 0.795 | MAP: 0.168 | GAP: 0.8706 | Avg_Loss: 3.936493

antoine77340 commented 7 years ago

yes your example is correct, is this working ?

junfengluo commented 7 years ago

I am still in training process, I just want to know the eval command in advance. Ok, thanks very much anyway @antoine77340 .

junfengluo commented 7 years ago

OK, thanks, I will also try it according to the inference command. @wincle

junfengluo commented 7 years ago

@antoine77340 Hi, here is another question, If I use all the train and validation data to train these models with no eval process, are the test results reasonable? Have you tried it in your experiments ?

junfengluo commented 7 years ago

Hi, when I used your inference code to the test data, I have met the errors as following: "tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value train_input/input_producer/limit_epochs/epochs". I did not revise the code anywhere. @antoine77340 @wincle

antoine77340 commented 7 years ago

Hi @junfengluo, If you use the train and validation data to train the models without eval process, the results would be very similar.

junfengluo commented 7 years ago

The inference command is same with your command and I just copied it, I also don't know where is the problem. For example in ''inference-GRU'', my command is : python ../inference.py --output_file=test-GRU-0002-1200.csv --input_data_pattern="/data/test/test*.tfrecord" --model=GruModel --train_dir=GRU-0002-1200 --frame_features=True --feature_names="rgb,audio" --feature_sizes="1024,128" --batch_size=1024 --base_learning_rate=0.0002 --gru_cells=1200 --learning_rate_decay=0.9 --moe_l2=1e-6 --run_once=True --top_k=20. The error message : image

antoine77340 commented 7 years ago

Hmm it is strange I do not understand this error (I am still not very good at understanding Tensorflow error code ahah). I tried to re-run this inference command with the latest TF version and it seems to work on my side. Are you sure you correctly trained the model and that at least one model is correctly exported ?

junfengluo commented 7 years ago

Yeah, I also trained the models with TF 1.3.0, I am sure the GRU model is trained correctly by 300000 steps. There have two models are still in training in two single GPU, I don't know how to solve this problem. Are these 7 models affect each other when execute the inference command ?

wincle commented 7 years ago

I haven't met that problem , it's all right for me to inference or evaluate.

junfengluo commented 6 years ago

Hello, can you tell me how to transform the video id such as "-1VnJGJ6c2U" to a integer which is showed in result *.csv file.

junfengluo commented 6 years ago

I find that the code which transform the video id into integer is mainly about two sentence in the inference.py as : 1,video_id_batch, video_batch, num_frames_batch = get_input_data_tensors(reader, data_pattern, batch_size). 2,video_id_batch_val, video_batch_val,num_frames_batch_val = sess.run([video_id_batch, video_batch, num_frames_batch]). where the video_id_batch_val is the integer. But I don't the details, can you tell me? @wincle @antoine77340