Open ketan-lambat opened 2 years ago
For FID,you can refer this repo: https://github.com/bioinf-jku/TTUR
Hello, have you found a solutions for ur questions? Can you share the notebook for custom dataset?
Hello, have you found a solutions for ur questions? Can you share the notebook for custom dataset?
I have provided a link to the notebook above. Posting again, Link to Colab Notebook
Also, the paper mentions 3 different coloured outputs for one input bnw image. How to get such results?
You can just run the sampling 3 times, and it will give you 3 different results.
The sampling is stochastic by default, so each run should give you different results. See https://github.com/google-research/google-research/blob/master/coltran/models/colorizer.py#L282
How long do I need to train on a custom dataset?
The longer you train, the results will be better. I would use the maximum batch-size that fits in memory and train for around 500K steps. There should be a train_summaries
subdirectory in /content/drive/MyDrive/Colab_Work/HONORS/ColTran-v2/google-research/coltran/logs/cityscapes_ckpt
. For a sanity check, you could point tensorboard to this directory to see if the train loss goes down.
How to get the result/output for a custom dataset?
For a custom dataset, as ling as your dataset directory is supported by tf.io.decode_image
(https://www.tensorflow.org/api_docs/python/tf/io/decode_image), it should work.
till GPU resource got exhausted
Btw, the GPU should not OOM during training. That seems a bit weird.
Here is the FID Script that I used for ImageNet. Hope you can adapt it to your dataset. I used the TFGAN implementation. (https://github.com/tensorflow/gan/blob/99bb93042520040dac401237616c10e54ab80a9f/tensorflow_gan/python/eval/inception_metrics.py#L130)
def normalize(x):
# inception checkpoints expects inputs to be in [-1, 1]
# https://codesearch.corp.google.com/piper///depot/google3/third_party/py/tensorflow_gan/examples/cifar/eval_lib.py?dr=CSs&g=0&l=34.
# https://codesearch.corp.google.com/piper///depot/google3/third_party/py/tensorflow_gan/examples/cifar/data_provider.py?dr=CSs&g=0&l=40
x = tf.squeeze(x['image'], axis=0)
logging.info(x.shape)
x = tf.to_float(x)
# Normalize from [0, 255] to [-1.0, 1.0]
x = (x / 128.0) - 1.0
return x
# Real dataset.
real_dataset = datasets.get_dataset(
name=FLAGS.dataset, subset='test', config=config, batch_size=1)
real_dataset = real_dataset.map(normalize, num_parallel_calls=100)
real_dataset = real_dataset.skip(FLAGS.samples)
real_dataset = real_dataset.batch(batch_size=FLAGS.batch_size)
real_dataset = real_dataset.skip(skip_samples // FLAGS.batch_size)
real_iterator = tf.compat.v1.data.make_initializable_iterator(real_dataset)
real_dataset = real_iterator.get_next()
gen_dataset = datasets.get_dataset(
name=FLAGS.dataset, subset='test', config=config, batch_size=1)
gen_dataset = gen_dataset.map(normalize, num_parallel_calls=100)
gen_dataset = gen_dataset.batch(batch_size=FLAGS.batch_size)
gen_iterator = tf.compat.v1.data.make_initializable_iterator(gen_dataset)
gen_dataset = gen_iterator.get_next()
fid_stream = tfgan.eval.frechet_inception_distance_streaming
distance, update_op = fid_stream(real_dataset, gen_dataset)
logging.info(distance)
logging.info(update_op)
batch_size = FLAGS.batch_size
with tf.Session() as sess:
init_ops = ([real_iterator.initializer, gen_iterator.initializer,
tf.initialize_local_variables()])
sess.run(init_ops)
for epoch in range(1, num_epochs + 1):
sess.run(update_op)
if epoch % 10 == 0:
dist_np = sess.run(distance)
fid_str = f'Number of samples: {epoch * batch_size}, fid: {dist_np}'
logging.info(fid_str)
distance_np = sess.run(distance)
logging.info(distance_np)
@MechCoder Thanks a lot for this. I am around halfway done through making one for my dataset, actually running into errors related to TF versions. Hopefully, should be able to resolve this with some more efforts.
Sorry to bother with some more questions
real_dataset = datasets.get_dataset( name=FLAGS.dataset, subset='test', config=config, batch_size=1)
What is the config file that is passed as input to this and to gen_dataset as well?
Also, Is it okay to use these FID implementations? https://github.com/mseitzer/pytorch-fid https://github.com/toshas/torch-fidelity I have used them to get results for some other models and were pretty straightforward to use.
No problem, happy to help.
Also, Is it okay to use these FID implementations?
I think it should be okay as long as you apply the same type of cropping to both generate and evaluate the images. We use central cropping to convert the high-res images into 256x256 (https://github.com/google-research/google-research/blob/master/coltran/datasets.py#L37)
What is the config file that is passed as input to this and to gen_dataset as well?
The config in my script is just config = {'resolution': [FLAGS.resolution, FLAGS.resolution]}
. The above code snippet is to compute the baseline FID between two sets of ground truth images.
I custom trained the 3 models, Colorizer
, Color Upsampler
and Spatial Upsampler
for a custom dataset.
Then used the custom_colorize
script to get the results.
The First 2 stages went smooth, got the following error for the 3rd Spatial Upsampler
step.
I used this command
!python -m coltran.custom_colorize --config=coltran/configs/spatial_upsampler.py \
--logdir=$SPATIAL_UPSMPLR_LOGDIR --img_dir=$IMG_DIR --store_dir=$STORE_DIR \
--gen_data_dir=$STORE_DIR/stage2 --mode=$MODE
I am using google colab. Is this because of colab GPU limits? Any help on how to solve this issue?
2022-02-16 08:20:45.489115: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/training/moving_averages.py:548: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0216 08:20:51.389240 140616530913152 deprecation.py:343] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/training/moving_averages.py:548: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
I0216 08:20:52.797949 140616530913152 train_utils.py:91] Built with exponential moving average.
I0216 08:20:52.813167 140616530913152 train_utils.py:185] Restoring from /content/drive/MyDrive/Colab_Work/HONORS/coltran-v3/coltran-cityscapes-v2-finetune-3/google-research/coltran/logs/cityscapes_ft_spatial_upsampler.
I0216 08:20:56.561913 140616530913152 custom_colorize.py:207] Producing sample after 37600 training steps.
I0216 08:20:56.562508 140616530913152 custom_colorize.py:210] 100
2022-02-16 08:21:08.242997: W tensorflow/core/common_runtime/bfc_allocator.cc:462] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.25GiB (rounded to 1342177280)requested by op Softmax
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2022-02-16 08:21:08.243377: W tensorflow/core/common_runtime/bfc_allocator.cc:474] *__**_****_*******________********************************************_______*************__________
2022-02-16 08:21:08.246940: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at softmax_op_gpu.cu.cc:219 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[5,256,256,4,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/MyDrive/Colab_Work/HONORS/coltran-v3/coltran-cityscapes-v2-finetune-3/google-research/coltran/custom_colorize.py", line 244, in <module>
app.run(main)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/content/drive/MyDrive/Colab_Work/HONORS/coltran-v3/coltran-cityscapes-v2-finetune-3/google-research/coltran/custom_colorize.py", line 227, in main
out = model.sample(gray_cond=gray, inputs=prev_gen, mode='argmax')
File "/content/drive/MyDrive/Colab_Work/HONORS/coltran-v3/coltran-cityscapes-v2-finetune-3/google-research/coltran/models/upsampler.py", line 254, in sample
logits = self.upsampler(inputs, gray_cond, training=False)
File "/content/drive/MyDrive/Colab_Work/HONORS/coltran-v3/coltran-cityscapes-v2-finetune-3/google-research/coltran/models/upsampler.py", line 245, in upsampler
context = self.encoder(channel, training=training)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/content/drive/MyDrive/Colab_Work/HONORS/coltran-v3/coltran-cityscapes-v2-finetune-3/google-research/coltran/models/layers.py", line 668, in call
output = layer(inputs)
File "/content/drive/MyDrive/Colab_Work/HONORS/coltran-v3/coltran-cityscapes-v2-finetune-3/google-research/coltran/models/layers.py", line 611, in call
weights = tf.nn.softmax(alphas)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Exception encountered when calling layer "self_attention_nd" (type SelfAttentionND).
OOM when allocating tensor with shape[5,256,256,4,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Softmax]
Call arguments received:
• inputs=tf.Tensor(shape=(5, 256, 256, 512), dtype=float32)
CPU times: user 294 ms, sys: 59.3 ms, total: 353 ms
Wall time: 47.6 s
try setting batch size=1, in the spatial upsampler config?
This is my coltran/configs/spatial_upsampler.py
Seems the batch_size is 1
# coding=utf-8
# Copyright 2021 The Google Research Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Test configurations for color upsampler."""
from ml_collections import ConfigDict
def get_config():
"""Experiment configuration."""
config = ConfigDict()
# Data.
config.dataset = 'imagenet'
config.downsample = True
config.downsample_res = 64
config.resolution = [256, 256]
config.random_channel = True
# Training.
config.batch_size = 1
config.max_train_steps = 300000
config.save_checkpoint_secs = 900
config.num_epochs = -1
config.polyak_decay = 0.999
config.eval_num_examples = 20000
config.eval_batch_size = 16
config.eval_checkpoint_wait_secs = -1
config.optimizer = ConfigDict()
config.optimizer.type = 'rmsprop'
config.optimizer.learning_rate = 3e-4
# Model.
config.model = ConfigDict()
config.model.hidden_size = 512
config.model.ff_size = 512
config.model.num_heads = 4
config.model.num_encoder_layers = 3
config.model.resolution = [64, 64]
config.model.name = 'spatial_upsampler'
config.sample = ConfigDict()
config.sample.gen_data_dir = ''
config.sample.log_dir = 'samples_sweep'
config.sample.batch_size = 1
config.sample.mode = 'argmax'
config.sample.num_samples = 1
config.sample.num_outputs = 1
config.sample.skip_batches = 0
config.sample.gen_file = 'gen0'
return config
Does this comment fix your issue? (https://github.com/google-research/google-research/issues/838#issuecomment-930699980)
Yes, Thanks. I feel stupid for not checking that before. 🤦♂️😂
What I did
coltran.run
script coltran.custom_colorize
scriptGot FID score of around 59
while the FID score mentioned in paper is around 19
.
While calculating FID, both ground truth and generated images (count 436) are of res 256x256.
GroundTruth Images
Generated Images
How long do I need to train on a custom dataset?
How do I know if the training is complete?
How to get the result/output for a custom dataset?
How to calculate the FID?
I am using this Notebook, for training on the cityscapes dataset. Link to Colab Notebook
I trained the model on colab (till GPU resource got exhausted), got around 23 checkpoints.
Next, after training on a custom dataset, how to evaluate the model or obtain the results of colorized/recolorized images?
I used this cmd, but I guess it works only for imagenet dataset.
Now, I am trying to use this (see the notebook for the next 2 steps)
Can someone please tell me if I am following the correct commands for getting the output? A step-by-step guide would be appreciated. I am getting confused about which flow to follow.
Also, the paper mentions 3 different coloured outputs for one input bnw image.
How to get such results?