Closed wayne-weiwei closed 1 month ago
Hey there @wayne-weiwei!
What kind of results do you expect to be saved? There is a great deal of metrics being logged with tensorboard for find_and_avoid_v2, take a look at the relevant README section.
Let me know if this covers your questions or you have any additional ones.
Thanks for the reply. When I followed the file
Tensorboard is used for logging various aspects of the training procedure. To watch the tensorboard logs, navigate to
/deepworlds/examples/find_and_avoid_v2/controllers/robot_supervisor_manage
r and runtensorboard --logdir ./experiments/
.
I would meet this error :
~/webots/projects/deepworlds-dev/examples/find_and_avoid_v2/controllers/robot_supervisor_manager$ tensorboard --logdir ./experiments/
2024-10-03 15:06:50.490971: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-03 15:06:50.975236: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-10-03 15:06:51.511356: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-10-03 15:06:51.542694: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
W1003 15:06:51.564800 135249606198336 server_ingester.py:187] Failed to communicate with data server at localhost:40979: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:130.209.6.40:8080: HTTP proxy returned response code 503"
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:130.209.6.40:8080: HTTP proxy returned response code 503", grpc_status:14, created_time:"2024-10-03T15:06:51.564631055+01:00"}"
>
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.16.2 at http://localhost:6006/ (Press CTRL+C to quit)
Thank you again for your help.
Hmm it seems that you have several warnings that look like system-related. The last line indicates that tensorboard indeed runs on http://localhost:6006/. What happens when you visit that url while tensorboard is running?
The last warning shows error related to your network. Try using tensorboard --logdir ./experiments/ --host localhost --port 8088
, if it doesn't work.
Thank you very much for your help. I was able to obtain the final results of the model after modifying the system settings. Could you please let me know how I can record a video of the better cases during the model's training and testing?
Hi there, Thank you for providing such a great tool! I successfully trained the model
find_and_avoid_v2
and observed multiple results during the process. However, I noticed that none of these results seem to be saved into a file, or I might be looking in the wrong place and cannot access the generated files.I would really appreciate your help with the following:
Thank you so much in advance for your assistance, and I’m looking forward to your guidance!