Open huangmiumang opened 2 years ago
I hava a problem with WorkQueue. when i run WorkQueue in single process, it will take all filenames, then train. I expect it to take a filename, then train, then take a new filename again, then train again.
logs: before train, it return Out of range: All works in work queue work_queue are taken
Add epoch of 2 elements: ["./data/eval.csv" "./data/eval.csv"]
Add epoch of 2 elements: ["./data/train.csv" "./data/train.csv"]
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Take work: "./data/train.csv"
Take work: "./data/eval.csv"
Take work: "./data/train.csv"
Take work: "./data/eval.csv"
INFO:tensorflow:Saving checkpoints for 0 into ./result/model_WIDE_AND_DEEP_1672921107/model.ckpt.
INFO:tensorflow:Create incremental timer, incremental_save:False, incremental_save_secs:None
2023-01-05 12:19:03.144568: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at work_queue_ops.cc:320 : Out of range: All works in work queue work_queue are taken.
INFO:tensorflow:loss = 0.6893319, steps = 1
2023-01-05 12:19:06.699829: I tensorflow/core/common_runtime/tensorpool_allocator.cc:146] TensorPoolAllocator enabled
INFO:tensorflow:global_step/sec: 19.8375
INFO:tensorflow:loss = 0.5995521, steps = 101 (5.042 sec)
tf.data.experimental.parallel_interleave
is deprecated. Please use tf.data.Dataset.interleave
instead.
When use WorkQueue in 1ps/2worker,WorkQueue has 2 files, it will happen coredump.Source code in below:
coredump