Hello, I would like to express my gratitude for the excellent work on the TartanVO project.
I am currently in the process of implementing the training script and have come across a query regarding the data preprocessing, specifically the type of random cropping employed.
In the paper, I noticed that the specifics of the random crop method are not explicitly detailed. My current implementation explores the use of a standard RandomResizedCrop. And I implement a ConsistentRandomResizedCrop, which maintains consistent cropping parameters across frames within the same scene. This is comes my intuition, sequences for pose estimation are reasonably to be consistent for what they capture.
Could you kindly clarify the method you use?
Should the cropping be consistent across frames within a same scene for TartanVO?
If consistency is required, what parameters or strategies would you recommend for implementing this?
Thank you very much for your assistance!
The following is what I have implemented
class RandomCropAndResized(object):
# TODO: Implement handling for RGB images in phase two of development.
# DONE: RGB images are currently not included in the RandomResizedCrop (RCR) process.
# TODO: Consider implementing a "Consistent RandomResizedCrop" mechanism.
# I am not sure whether is function meets the paper's requirements.
# Current implementation results in different crops even within the same scene, which may not be ideal.
# Consideration: Ensure the cropped region remains consistent across a single scene.
"""
Crop the input data at a random location and resize it to the target size.
"""
def __init__(self):
self.transform = RandomResizedCrop(size=(112, 160), scale=(0.08, 1.0), ratio=(3./4., 4./3.))
def __call__(self, sample):
# Resulting shape [4, H, W]
combined = torch.cat([sample['flow'], sample['intrinsic']], dim=0)
# Apply the same transform to the combined tensor
transformed = self.transform(combined)
# Split the transformed tensor back into 'flow' and 'intrinsic'
sample['flow'] = transformed[:2] # First two channels
sample['intrinsic'] = transformed[2:] # Next two channels
return sample
class ConsistentRandomResizedCrop:
def __init__(self):
self.input_size = (112,160)
self.min_scale = 0.2
self.max_scale = 0.3
# Initialize cropping parameters
self.initialize_cropping_params()
print("Random crop parameters: ", self.top, self.left, self.crop_height, self.crop_width)
def initialize_cropping_params(self):
image_size = [112, 160]
# Randomly determine the scale of the crop
scale = random.uniform(self.min_scale, self.max_scale)
self.crop_height = int(image_size[0] * scale)
self.crop_width = int(image_size[1] * scale)
# Randomly choose the top left corner of the crop area
self.top = random.randint(0, image_size[0] - self.crop_height)
self.left = random.randint(0, image_size[1] - self.crop_width)
def __call__(self, sample):
# Resulting shape [4, H, W]
combined = torch.cat([sample['flow'], sample['intrinsic']], dim=0)
# Perform the crop and resize
transformed = resized_crop(combined, self.top, self.left, self.crop_height, self.crop_width, self.input_size,
interpolation=torchvision.transforms.InterpolationMode.BILINEAR)
# Split the transformed tensor back into 'flow' and 'intrinsic'
sample['flow'] = transformed[:2] # First two channels
sample['intrinsic'] = transformed[2:] # Next two channels
return sample
Hello, I would like to express my gratitude for the excellent work on the TartanVO project.
I am currently in the process of implementing the training script and have come across a query regarding the data preprocessing, specifically the type of random cropping employed.
In the paper, I noticed that the specifics of the random crop method are not explicitly detailed. My current implementation explores the use of a standard RandomResizedCrop. And I implement a ConsistentRandomResizedCrop, which maintains consistent cropping parameters across frames within the same scene. This is comes my intuition, sequences for pose estimation are reasonably to be consistent for what they capture.
Could you kindly clarify the method you use?
Thank you very much for your assistance!
The following is what I have implemented