NVIDIA / tao_tutorials

Quick start scripts and tutorial notebooks to get started with TAO Toolkit
Apache License 2.0
46 stars 12 forks source link

Using visual_changenet_segmentation_MVTec.ipynb with google colab #1

Closed Siwakonrome closed 10 months ago

Siwakonrome commented 11 months ago

Can you give me some configuration information to using visual_changenet_segmentation_MVTec.ipynb with google colab?

My code is my code docker failed at 5.1 Train Visual ChangeNet-Segmentation model.

%env DATA_DIR = /data/changenet/formatted_bottle_dataset %env MODEL_DIR = /model %env SPECS_DIR = /specs %env RESULTS_DIR = /results %env NUM_EPOCHS = 30 %env BACKBONE_PATH = /results/pretrained/pretrained_fan_classification_nvimagenet_vfan_base_hybrid_nvimagenet/fan_base_hybrid_nvimagenet.pth

spec = """ encryption_key: tlt_encode task: segment train: resume_training_checkpoint_path: null pretrained_model_path: null segment: loss: "ce" weights: [0.5, 0.5, 0.5, 0.8, 1.0] num_epochs: 30 num_nodes: 1 val_interval: 1 checkpoint_interval: 1 optim: lr: 0.0002 optim: "adamw" policy: "linear" momentum: 0.9 weight_decay: 0.01 results_dir: "/results" model: backbone: type: "fan_base_16_p4_hybrid" pretrained_backbone_path: /results/pretrained/pretrained_fan_classification_nvimagenet_vfan_base_hybrid_nvimagenet/fan_base_hybrid_nvimagenet.pth dataset: segment: dataset: "CNDataset" root_dir: /data/changenet/formatted_bottle_dataset data_name: "custom" label_transform: "norm" batch_size: 8 workers: 1 multi_scale_train: True multi_scale_infer: False num_classes: 2 img_size: 256 image_folder_name: "A" change_image_folder_name: 'B' list_folder_name: 'list' annotation_folder_name: "label" train_split: "train" validation_split: "val" test_split: 'test' predict_split: 'test' label_suffix: .png augmentation: random_flip: vflip_probability: 0.5 hflip_probability: 0.5 enable: True random_rotate: rotate_probability: 0.5 angle_list: [90, 180, 270] enable: True random_color: brightness: 0.3 contrast: 0.3 saturation: 0.3 hue: 0.3 enable: True with_scale_random_crop: enable: True with_random_crop: True with_random_blur: True evaluate: results_dir: "${results_dir}/evaluate" checkpoint: "${results_dir}/train/changenet.pth" trt_engine: "${gen_trt_engine.trt_engine}" batch_size: ${dataset.segment.batch_size} vis_after_n_batches: 1 inference: results_dir: "${results_dir}/inference" checkpoint: "${results_dir}/train/changenet.pth" trt_engine: "${gen_trt_engine.trt_engine}" batch_size: ${dataset.segment.batch_size} vis_after_n_batches: 1 export: results_dir: "${results_dir}/export" gpu_id: 0 checkpoint: "${results_dir}/train/changenet.pth" onnx_file: "${export.results_dir}/changenet.onnx" input_width: 256 input_height: 256 batch_size: ${dataset.segment.batch_size} gen_trt_engine: results_dir: "${results_dir}/gen_trt_engine" gpu_id: 0 onnx_file: "${export.onnx_file}" trt_engine: "${gen_trt_engine.results_dir}/changenet.trt" batch_size: ${dataset.segment.batch_size} input_channel: 3 input_width: 256 input_height: 256 tensorrt: data_type: FP32 workspace_size: 1024 min_batch_size: 1 opt_batch_size: 10 max_batch_size: 10 """

print("Train model") !tao model visual_changenet train \ -e $SPECS_DIR/experiment.yaml \ train.num_epochs=$NUM_EPOCHS \ dataset.segment.root_dir=$DATA_DIR \ model.backbone.pretrained_backbone_path=$BACKBONE_PATH

Train model 2023-12-12 15:35:38,600 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io'] Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 416, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/lib/python3.10/http/client.py", line 1283, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.10/http/client.py", line 1329, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output self.send(msg) File "/usr/lib/python3.10/http/client.py", line 976, in send self.connect() File "/usr/local/lib/python3.10/dist-packages/docker/transport/unixconn.py", line 43, in connect sock.connect(self.unix_socket) FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 799, in urlopen retries = retries.increment( File "/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "/usr/local/lib/python3.10/dist-packages/urllib3/packages/six.py", line 769, in reraise raise value.with_traceback(tb) File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 416, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/lib/python3.10/http/client.py", line 1283, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.10/http/client.py", line 1329, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output self.send(msg) File "/usr/lib/python3.10/http/client.py", line 976, in send self.connect() File "/usr/local/lib/python3.10/dist-packages/docker/transport/unixconn.py", line 43, in connect sock.connect(self.unix_socket) urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/docker/api/client.py", line 205, in _retrieve_server_version return self.version(api_version=False)["ApiVersion"] File "/usr/local/lib/python3.10/dist-packages/docker/api/daemon.py", line 181, in version return self._result(self._get(url), json=True) File "/usr/local/lib/python3.10/dist-packages/docker/utils/decorators.py", line 46, in inner return f(self, *args, kwargs) File "/usr/local/lib/python3.10/dist-packages/docker/api/client.py", line 228, in _get return self.get(url, self._set_request_timeout(kwargs)) File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 602, in get return self.request("GET", url, kwargs) File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 589, in request resp = self.send(prep, send_kwargs) File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 501, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/tao", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_cli/entrypoint/tao_launcher.py", line 134, in main instance.launch_command( File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_cli/components/instance_handler/local_instance.py", line 357, in launch_command docker_handler = self.handler_map[ File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_cli/components/instance_handler/local_instance.py", line 203, in handler_map handler_map[handler_key] = DockerHandler( File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_cli/components/docker_handler/docker_handler.py", line 92, in init self._docker_client = docker.from_env() File "/usr/local/lib/python3.10/dist-packages/docker/client.py", line 84, in from_env return cls( File "/usr/local/lib/python3.10/dist-packages/docker/client.py", line 40, in init self.api = APIClient(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/docker/api/client.py", line 188, in init self._version = self._retrieve_server_version() File "/usr/local/lib/python3.10/dist-packages/docker/api/client.py", line 212, in _retrieve_server_version raise DockerException( docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

Siwakonrome commented 10 months ago

I can fixed this issue now.