Closed YankoFelipe closed 4 weeks ago
Hey @YankoFelipe, thanks for the feedback! One first general comment: in DeepLabCut 3.0, we switched to the albumentations
package for image augmentation. This means that while most of the transforms available in DeepLabCut < 3 are available as well, there are some minor differences. Information about the DeepLabCut 3.0 augmentations are available here: deeplabcut.github.io/DeepLabCut/docs/pytorch/pytorch_config
Horizontal flips: As you said, there's a naming change here from fliplr
to hflip
. The docs here are a bit out of data (I'll make sure I update them). You can add horizontal flipping as an augmentation in different ways (with your bodyparts, I'm guessing that hflip: true
is the correct choice and all you need, as it doesn't look like you have any symmetric bodyparts):
# The first three here should only be used if you don't have symmetric keypoints (e.g. `leftEye`, `rightEye`) or
# are used for an object detector
# random flip with probability 50%
hflip: true
# random flip with probability 25%
hflip: 0.25
# random flip with probability 25%
hflip:
p: 0.25
# If you do have symmetric keypoints, you need to indicate them in the hflip configuration
# E.g. if your bodyparts are ["nose", "rightEye", "rightEar", "leftEye", "leftEar"]
hflip:
p: 0.25
symmetries:
- - 1
- 3
- - 2
- 4
One element I would edit would be removing the hflip: true
during inference; currently that's just randomly images when evaluating your model, which means you won't obtain the "true" performance of your model on the test set.
Translation is not applied symmetrically: That's indeed a bug in our code, and the values should be sampled symmetrically. I'll fix this in an upcoming PR.
That's also something I was curious about when developing, which is why in your logs the Albumentations transforms for training and inference are printed before training starts.
Data Transforms:
Training: Compose(
[
HorizontalFlip(always_apply=False, p=0.5),
Affine(always_apply=False, p=0.5, interpolation=1, mask_interpolation=0, cval=0, mode=0, scale={'x': (1.0, 1.0), 'y': (1.0, 1.0)}, translate_percent=None, translate_px={'x': (0, 50), 'y': (0, 50)}, rotate=(-90, 90), fit_output=False, shear={'x': (0.0, 0.0), 'y': (0.0, 0.0)}, cval_mask=0, keep_ratio=True, rotate_method='largest_box'),
Equalize(always_apply=False, p=0.5, mode='cv', by_channels=True, mask=None, mask_params=()),
GaussNoise(always_apply=False, p=0.5, var_limit=(0, 2500), per_channel=True, mean=0),
Normalize(always_apply=False, p=1.0, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], max_pixel_value=255.0),
],
p=1.0,
bbox_params={'format': 'coco', 'label_fields': ['bbox_labels'], 'min_area': 0.0, 'min_visibility': 0.0, 'min_width': 0.0, 'min_height': 0.0, 'check_each_transform': True},
keypoint_params={'format': 'xy', 'label_fields': ['class_labels'], 'remove_invisible': False, 'angle_in_degrees': True, 'check_each_transform': True},
additional_targets={},
is_check_shapes=True
)
Validation: Compose(
[
HorizontalFlip(always_apply=False, p=0.5),
Normalize(always_apply=False, p=1.0, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], max_pixel_value=255.0),
],
...
)
If you're interested in seeing what the augmented data actually looks like, the following code snippet might help (which is basically what the train_network
method does to load data):
import matplotlib.pyplot as plt
import torch
import torchvision.transforms as transforms
from deeplabcut.pose_estimation_pytorch.data import build_transforms, DLCLoader
from deeplabcut.pose_estimation_pytorch.task import Task
loader = DLCLoader(
config="/Users/niels/Documents/upamathis/datasets2/test/trimice-dlc-2021-06-22/config.yaml",
shuffle=1,
trainset_index=0,
)
transform = build_transforms(loader.model_cfg["data"]["train"])
transform_inf = build_transforms(loader.model_cfg["data"]["inference"])
pose_task = Task(loader.model_cfg["method"])
train_dataset = loader.create_dataset(transform=transform, mode="train", task=pose_task)
valid_dataset = loader.create_dataset(transform= transform_inf, mode="test", task=pose_task)
print(f"Number of training images: {len(train_dataset)}")
print(f"Number of validation images: {len(valid_dataset)}")
# Needed so when we plot the image, the color channels aren't normalized
denormalize = transforms.Compose(
[
transforms.Normalize(mean=[0, 0, 0], std=[1/0.229, 1/0.224, 1/0.225]),
transforms.Normalize(mean=[-0.485, -0.456, -0.406], std=[1, 1, 1]),
]
)
def plot_augmented_image(dataset, index):
sample_train_data = dataset[index]
train_image = sample_train_data["image"]
# image was normalized to ImageNet means, so it needs to be un-normalized to have the correct visual appearance
img = denormalize(torch.tensor(train_image))
# Image is (C, H, W) and we need it to be (H, W, C) to plot it
img = img.numpy().transpose((1, 2, 0))
fig, ax = plt.subplots(1)
ax.imshow(img)
plt.show()
plot_augmented_image(train_dataset, 0)
plot_augmented_image(train_dataset, 0)
As there are random augmentations in your images, calling plot_augmented_image
multiple times with the same image index should lead to different transformations being seen.
Is there an existing issue for this?
Bug description
Hello DeepLabCut!
I've been observing little variation of my training results when I started adding augmentations in my project (2D/single animal/ PyTorch) using the documentation in https://deeplabcut.github.io/DeepLabCut/docs/recipes/pose_cfg_file_breakdown.html and I found some differences with the actual code in
deeplabcut.pose_estimation_pytorch.data.transforms.build_transforms
such as:build_transforms
but the parameterhflip
is usedrotation
was moved to the newaffine
dictionary , which is a nice addition with all its customizability, but the underlying use isn't symmetrical. Around the line 78 ofdeeplabcut.pose_estimation_pytorch.data.transforms
where it saysIt's a bit misleading that rotation is applied symmetrically but translation isn't.
I understand this is still a beta functionality but I think it would be good to consider that all these differences may bring issues to people migrating from their 2.x projects.
Besides, even after making the modifications to match the values expected in the code (see attached log) and trying different values of augmentations, the evolution of my losses (train/val) is more or less the same (increasing the training epoch has only showed overfitting for my project).
Operating System
SUSE Linux 15.5
DeepLabCut version
dlc version 3.0.0rc4
DeepLabCut mode
single animal
Device type
Nvidia A100
Steps To Reproduce
config.yaml (except video_sets because it's too large)
pose_cfg.yaml
pytorch_config.yaml
Relevant log output
Anything else?
I'd like to know if there's a way to debug and check that the augmentations are applied during training time in DLC Pytorch.
I've trained using the same version (3.0.0rc4) in Windows 10 (22H2) with a RTX 4090 with similar results.
I'm looking forward to your comments :)
Code of Conduct