UnicodeDecodeError occured when training Tanks and Temples dataset processed by the NeRF++

yzslab commented 2 years ago

Hi, I am trying to train Tanks and Temples dataset by this command:

python train.py \
    --gin_configs=configs/tat.gin \
    --gin_bindings="Config.data_dir = '/mnt/x/dataset/nerfplusplus/tanks_and_temples/tat_intermediate_Playground'" \
    --gin_bindings="Config.checkpoint_dir = '/mnt/x/NeRF-Data/multinerf_results/checkpoints/tanks_and_temples/tat_intermediate_Playground'" \
    --gin_bindings="Config.batch_size = 4096" \
    --logtostderr

But it reports UnicodeDecodeError:

~/src/multinerf$ bash scripts/train_tat.sh
I0805 14:18:43.132598 140224505852096 xla_bridge.py:328] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: 
I0805 14:18:43.320095 140224505852096 xla_bridge.py:328] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter CUDA Host
I0805 14:18:43.320447 140224505852096 xla_bridge.py:328] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
/home/zhensheng/anaconda3/envs/multinerf/lib/python3.9/site-packages/jax/_src/lib/xla_bridge.py:515: UserWarning: jax.host_id has been renamed to jax.process_index. This alias will eventually be removed; please update your code.
  warnings.warn(
Traceback (most recent call last):
  File "/home/zhensheng/anaconda3/envs/multinerf/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/zhensheng/anaconda3/envs/multinerf/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/zhensheng/src/multinerf/train.py", line 288, in <module>
    app.run(main)
  File "/home/zhensheng/anaconda3/envs/multinerf/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/zhensheng/anaconda3/envs/multinerf/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/zhensheng/src/multinerf/train.py", line 55, in main
    dataset = datasets.load_dataset('train', config.data_dir, config)
  File "/home/zhensheng/src/multinerf/internal/datasets.py", line 52, in load_dataset
    return dataset_dict[config.dataset_loader](split, train_dir, config)
  File "/home/zhensheng/src/multinerf/internal/datasets.py", line 258, in __init__
    self._load_renderings(config)
  File "/home/zhensheng/src/multinerf/internal/datasets.py", line 710, in _load_renderings
    images = load_files('rgb', lambda f: np.array(Image.open(f))) / 255.
  File "/home/zhensheng/src/multinerf/internal/datasets.py", line 697, in load_files
    mats = np.array([load_fn(utils.open_file(f)) for f in files])
  File "/home/zhensheng/src/multinerf/internal/datasets.py", line 697, in <listcomp>
    mats = np.array([load_fn(utils.open_file(f)) for f in files])
  File "/home/zhensheng/src/multinerf/internal/datasets.py", line 710, in <lambda>
    images = load_files('rgb', lambda f: np.array(Image.open(f))) / 255.
  File "/home/zhensheng/anaconda3/envs/multinerf/lib/python3.9/site-packages/PIL/Image.py", line 3101, in open
    prefix = fp.read(16)
  File "/home/zhensheng/anaconda3/envs/multinerf/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

image.open.png

Do you know how to fix it? Thanks.

jonbarron commented 2 years ago

It sounds like your python installation is having trouble reading images? This is most likely unrelated to our codebase. Have you confirmed that our unit tests pass, and have you tried loading an image with PIL outside of this codebase?

yzslab commented 2 years ago

It sounds like your python installation is having trouble reading images? This is most likely unrelated to our codebase. Have you confirmed that our unit tests pass, and have you tried loading an image with PIL outside of this codebase?

Hi, thanks for your reply.

I tried to read image with PIL outside of this codebase:

from PIL import Image
Image.open(open("dataset/tat_intermediate_Playground/train/rgb/00001.png"))

and it reported the same error. So, this problem is indeed related to PIL.

I also found that adding a parameter mode='rb' to open() can solve it (source). After adding mode='rb' to the line 697 of internal/datasets.py, the training works now.

Thanks.

jonbarron commented 2 years ago

Could you clarify which line you needed to add mode='rb' to? Or better yet, could you push a CL? Line 697 in the repo doesn't have any code that opens files.

yzslab commented 2 years ago

Could you clarify which line you needed to add mode='rb' to? Or better yet, could you push a CL? Line 697 in the repo doesn't have any code that opens files.

Hi, seems like because of this commit: https://github.com/google-research/multinerf/commit/25d1748612579dcb8c9e1689cfe64a1f2345c2ba, the line number has been changed.

Here is the line 697 I mentioned before: https://github.com/google-research/multinerf/blob/4f6ffeca73888a83a53f60d880b21633ac6cf28b/internal/datasets.py#L697

Its new line number is 737 now: https://github.com/google-research/multinerf/blob/7c9dc01cef398e35c7b86fb52d9ecb5a979f5b85/internal/datasets.py#L737

google-research / multinerf

UnicodeDecodeError occured when training Tanks and Temples dataset processed by the NeRF++ #7