Jan 30 18:58:37 v100-benchmark systemd[1]: Stopped GPU Utilization Metric Agent.
Jan 30 18:58:37 v100-benchmark systemd[1]: Started GPU Utilization Metric Agent.
Jan 30 18:58:37 v100-benchmark bash[23388]: mesg: ttyname failed: Inappropriate ioctl for device
Jan 30 18:58:38 v100-benchmark bash[23388]: Traceback (most recent call last):
Jan 30 18:58:38 v100-benchmark bash[23388]: File "/root/report_gpu_metrics.py", line 116, in <module>
Jan 30 18:58:38 v100-benchmark bash[23388]: main()
Jan 30 18:58:38 v100-benchmark bash[23388]: File "/root/report_gpu_metrics.py", line 108, in main
Jan 30 18:58:38 v100-benchmark bash[23388]: instance_id, zone, project_id)
Jan 30 18:58:38 v100-benchmark bash[23388]: File "/root/report_gpu_metrics.py", line 58, in report_metric
Jan 30 18:58:38 v100-benchmark bash[23388]: client.create_time_series(project_name, [series])
Jan 30 18:58:38 v100-benchmark bash[23388]: File "/usr/local/lib/python2.7/dist-packages/google/cloud/monitoring_v3/gapic/metric_service_client.py", line 897, in create_time_se
ries
Jan 30 18:58:38 v100-benchmark bash[23388]: request, retry=retry, timeout=timeout, metadata=metadata
Jan 30 18:58:38 v100-benchmark bash[23388]: File "/usr/local/lib/python2.7/dist-packages/google/api_core/gapic_v1/method.py", line 143, in __call__
Jan 30 18:58:38 v100-benchmark bash[23388]: return wrapped_func(*args, **kwargs)
Jan 30 18:58:38 v100-benchmark bash[23388]: File "/usr/local/lib/python2.7/dist-packages/google/api_core/retry.py", line 270, in retry_wrapped_func
Jan 30 18:58:38 v100-benchmark bash[23388]: on_error=on_error,
Jan 30 18:58:38 v100-benchmark bash[23388]: File "/usr/local/lib/python2.7/dist-packages/google/api_core/retry.py", line 179, in retry_target
Jan 30 18:58:38 v100-benchmark bash[23388]: return target()
Jan 30 18:58:38 v100-benchmark bash[23388]: File "/usr/local/lib/python2.7/dist-packages/google/api_core/timeout.py", line 214, in func_with_timeout
Jan 30 18:58:38 v100-benchmark bash[23388]: return func(*args, **kwargs)
Jan 30 18:58:38 v100-benchmark bash[23388]: File "/usr/local/lib/python2.7/dist-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
Jan 30 18:58:38 v100-benchmark bash[23388]: six.raise_from(exceptions.from_grpc_error(exc), exc)
Jan 30 18:58:38 v100-benchmark bash[23388]: File "/usr/local/lib/python2.7/dist-packages/six.py", line 737, in raise_from
Jan 30 18:58:38 v100-benchmark bash[23388]: raise value
Jan 30 18:58:38 v100-benchmark bash[23388]: google.api_core.exceptions.InvalidArgument: 400 One or more TimeSeries could not be written: One or more points were written more freq
uently than the maximum sampling period configured for the metric. {Metric: custom.googleapis.com/gpu_utilization, Timestamps: {Youngest Existing: '2019/01/30-10:58:33.791', New:
'2019/01/30-10:58:38.189'}}: timeSeries[0]
Failed after ~1 hour in a n1-standard-16 with 4-V100 training planespotting:
python -m trainer_yolo.main --hp-layers 17 --tiledata "gs://planespotting-data-public/tiles_from_USGS_photos" --hp-evaluations 4 --hp-iterations 9400 --hp-batch-size 32 && date