NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
13.32k stars 3.19k forks source link

Extracting Randomly Generated Parameters from "fn.decoders.image_random_crop"? #1309

Open AhmedHussKhalifa opened 1 year ago

AhmedHussKhalifa commented 1 year ago

Hello,

I am currently working on training on ImageNet within the KD framework, using the following "create_dali_pipeline" function.

My objective is to extract the augmentation parameters that are applied to the original image during training, so that I can store and utilize them for offline training purposes, specifically the coordinates generated by "fn.decoders.image_random_crop".

While fixing the seed can generate fixed augmentation parameters, it doesn't solve other potential issues that may arise. Therefore, I am looking for alternative solutions to overcome these challenges.

If you have any insights or suggestions on how I can accomplish this, I would greatly appreciate your assistance. Thank you.

def create_dali_pipeline(data_dir, crop, size, shard_id, num_shards, NUM_QFs=0, dali_cpu=False, is_training=True):
    images, labels = fn.readers.file(file_root=data_dir,
                                     shard_id=shard_id,
                                     num_shards=num_shards,
                                     random_shuffle=is_training,
                                     pad_last_batch=True,
                                     name="Reader")

    dali_device = 'cpu' if dali_cpu else 'gpu'
    decoder_device = 'cpu' if dali_cpu else 'mixed'
    # ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime

    device_memory_padding = 211025920 if decoder_device == 'mixed' else 0
    host_memory_padding = 140544512 if decoder_device == 'mixed' else 0
    # ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime

    preallocate_width_hint = 5980 if decoder_device == 'mixed' else 0
    preallocate_height_hint = 6430 if decoder_device == 'mixed' else 0
    images_org = fn.decoders.image_random_crop(images,
                                            device=decoder_device, output_type=types.RGB,
                                            device_memory_padding=device_memory_padding,
                                            host_memory_padding=host_memory_padding,
                                            preallocate_width_hint=preallocate_width_hint,
                                            preallocate_height_hint=preallocate_height_hint,
                                            random_aspect_ratio=[0.75, 4.0 / 3.0],
                                            random_area=[0.08, 1.0],
                                            num_attempts=100)
    images = fn.resize(images_org,
                        device=dali_device,
                        resize_x=crop,
                        resize_y=crop,
                        interp_type=types.INTERP_TRIANGULAR)
    mirror = fn.random.coin_flip(probability=0.5)
    original = fn.crop_mirror_normalize(images.gpu(),
                                    dtype=types.FLOAT,
                                    output_layout="CHW",
                                    crop=(crop, crop),
                                    mean=[0.485 * 255,0.456 * 255,0.406 * 255],
                                    std=[0.229 * 255,0.224 * 255,0.225 * 255],
                                    mirror=mirror)