azavea / raster-vision-examples

Examples of using Raster Vision on open datasets
Other
172 stars 33 forks source link

training stage error Keras validatation_steps=None #62

Open lukemckinstry opened 5 years ago

lukemckinstry commented 5 years ago

I encountered this error trying to run the rio spacenet example in Google Colab

Ensuring input files exist [####################################] 100% Checking for existing output [####################################] 100% Saving command configuration to data/examples/spacenet/rio/remote-output/train/spacenet-rio-chip-classification-test/command-config-0.json... Saving command configuration to data/examples/spacenet/rio/remote-output/bundle/spacenet-rio-chip-classification-test/command-config-0.json... Saving command configuration to data/examples/spacenet/rio/remote-output/predict/spacenet-rio-chip-classification-test/command-config-0.json... Saving command configuration to data/examples/spacenet/rio/remote-output/eval/spacenet-rio-chip-classification-test/command-config-0.json... python -m rastervision run_command data/examples/spacenet/rio/remote-output/train/spacenet-rio-chip-classification-test/command-config-0.json Training model... /usr/local/lib/python3.6/dist-packages/pluginbase.py:439: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. fromlist, level) Using TensorFlow backend. 2019-06-06 14:27:00:rastervision.utils.files: INFO - Downloading https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5 to /tmp/tmp7n3lkztl/tmpd83op6nh/http/github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5 WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2019-06-06 14:27:03.794498: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz 2019-06-06 14:27:03.794813: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x14cf4a0 executing computations on platform Host. Devices: 2019-06-06 14:27:03.794849: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-06-06 14:27:04.065789: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-06-06 14:27:04.066325: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x14cf080 executing computations on platform CUDA. Devices: 2019-06-06 14:27:04.066371: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla T4, Compute Capability 7.5 2019-06-06 14:27:04.066753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59 pciBusID: 0000:00:04.0 totalMemory: 14.73GiB freeMemory: 14.60GiB 2019-06-06 14:27:04.066777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-06-06 14:27:05.517909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-06-06 14:27:05.517977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-06-06 14:27:05.517990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-06-06 14:27:05.518287: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0. 2019-06-06 14:27:05.518385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14115 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5) Found 0 images belonging to 2 classes. Found 0 images belonging to 2 classes. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. TensorBoard 1.13.1 at http://fea6b3f9897c:6006 (Press CTRL+C to quit) Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.6/dist-packages/rastervision/main.py", line 17, in rv.main() File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 722, in call return self.main(args, kwargs) File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 535, in invoke return callback(args, kwargs) File "/usr/local/lib/python3.6/dist-packages/rastervision/cli/main.py", line 292, in run_command rv.runner.CommandRunner.run(command_config_uri) File "/usr/local/lib/python3.6/dist-packages/rastervision/runner/command_runner.py", line 11, in run CommandRunner.run_from_proto(msg) File "/usr/local/lib/python3.6/dist-packages/rastervision/runner/command_runner.py", line 17, in run_from_proto command.run() File "/usr/local/lib/python3.6/dist-packages/rastervision/command/train_command.py", line 21, in run task.train(tmp_dir) File "/usr/local/lib/python3.6/dist-packages/rastervision/task/task.py", line 138, in train self.backend.train(tmp_dir) File "/usr/local/lib/python3.6/dist-packages/rastervision/backend/keras_classification/backend.py", line 263, in train _train(backend_config_path, pretrained_model_path, do_monitoring) File "/usr/local/lib/python3.6/dist-packages/rastervision/backend/keras_classification/commands/train.py", line 15, in _train trainer.train(do_monitoring) File "/usr/local/lib/python3.6/dist-packages/rastervision/backend/keras_classification/core/trainer.py", line 150, in train callbacks=callbacks) File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, *kwargs) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator initial_epoch=initial_epoch) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 68, in fit_generator raise ValueError('validation_steps=None is only valid for a' ValueError: validation_steps=None is only valid for a generator based on the keras.utils.Sequence class. Please specify validation_steps or use the keras.utils.Sequence class. TensorBoard caught SIGTERM; exiting... /tmp/tmpkna9hjv6/tmpfdeg5814/Makefile:6: recipe for target '2' failed make: [2] Error 1

lewfish commented 5 years ago

Which command did you run and how did you install and run it in Colab? I suspect that you're not using the Docker image, which means you're probably using an incompatible version of TF and/or Keras.

lukemckinstry commented 5 years ago

Installed with pip install rastervision==0.9.0rc1

ran with: rastervision run local -e rvexamples.examples.spacenet.rio.chip_classification -a raw_uri {RAW_URI} -a processed_uri {PROCESSED_URI} -a root_uri {ROOT_URI} -a test True --splits 2

tf and keras versions: Keras==2.2.4 tensorflow==1.14.0rc1