angelolab / Nimbus

Other
13 stars 1 forks source link

Promix Naive #26

Closed JLrumberger closed 2 years ago

JLrumberger commented 2 years ago

Relevant background

We assume that 5-15% of the marker positive/negative labels in our dataset are wrong. To cope with this, we will adapt the clean sample selection procedure from ProMix from the image-level to the cell-level.

Design overview

We adapt the methods presented as ProMix Naive for the binary pixel-wise classification case as follows:

  1. Class-wise small-loss selection:

    • after each epoch they split the training data into subsets based on its training label, s.t. one has C subsets S for each label c_0,c_1,..., c_C. They calculate the CE loss for the samples in each subset and select for each subset the top-k samples with the lowest loss as their clean training data, thus arriving at a roughly balanced sample
    • We'll calculate the running 5%-percentile of the loss on foreground GT areas, separately for GT positive and negative areas. Then we'll mask out the loss where it is higher than the running 5% percentile of each class
  2. Matched High-Confidence Selection

    • they calculate confidence scores for each training sample and sample the ones where the confidence is higher than a given threshold tau (tau = 0.99 for CIFAR10N and tau = 0.95 for CIFAR100N)
    • We mask out areas in predictions where the predicted confidence is lower than the threshold tau = 0.95

The loss and confidence scores used in the above calculations are based on un-augmented training data

  1. Consistency Regularization

    • they augment the images with weak augmentations (zoom/crop, color jitter, gaussian blur) and strong augmentations (AutoContrast, Equalize, Rotate, Solarize, Color, Posterize, Contrast, Brightness, Sharpness, Shear, Translate) and calculate the loss based on the predictions and the GT. So this is not augmentation consistency training but just normal data augmentation.
    • I'll just keep my augmentation pipeline as is, since I don't think its much different
  2. Mixup

Code mockup

For 1. and 2. we'll sub-class ModelBuilder and change class functions

add class functions

for x,y_gt in data:
    y_pred = model(x)
    loss_mask = model.class_wise_loss_selection(y_pred, y_gt)
    loss_mask += model.matched_high_confidence_selection(y_pred, y_gt)
    x_aug, loss_mask_aug, y_gt_aug = aug_fn(x, loss_mask, y_gt)
    with tf.gradient_tape as tape:
        y_pred_aug = model(x_aug)
        loss_value = model.loss_fn(y_gt_aug, y_pred_aug)
        loss_value = model.masked_loss_fn(loss_value, loss_mask_aug)

    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

Code inside the training loop should be put into a static class function decorated by @tf.function in order to use static-graph mode during training. Probably we need to change the augmentation library we're using, this could be a good time to switch to one of the tf-based augmentation libraries.

Required inputs

Provided a description of the required inputs for this project, including column names for dfs, dimensions for image data, prompts for user input, directory structure for loading data, etc

Output files

Provide a description of the outputs for this project. If any plots will be generated, provide (simple) sketches demonstrating the plot type and axes labels.

Timeline Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.

Estimated date when a fully implemented version will be ready for review:

Estimated date when the finalized project will be merged in:

ngreenwald commented 2 years ago

Looks good! Couple questions: 1) For loss masking, are you saying we'd take the bottom 5% CE as clean examples, or the bottom 95% CE as clean examples? 2) What is the difference between confidence and CE? 3) Is there a schedule for how the thresholds for confidence and CE loss change over time? 4) I think for the first version, having only two classes (positive and negative) makes sense. However, we should check and see what the distribution of confidence/loss looks like across different markers. Given that some markers are harder than others, and some markers have more errors than others, there will definitely be differences in the balance of the cleaned training dataset. Based on how skewed this is, we can decide whether we need to make any adjustments to how clean examples are sampled, for example having minimums per channel, not just per class, balancing, etc.

JLrumberger commented 2 years ago
  1. I'd say we take the bottom 5% lowest CE pixels as clean examples. Noah: Let's take the bottom 5% lowest CE of cells instead of individual pixels
  2. Well.. I think it's interchangeable, since high confidence and aligning with GT leads to a low CE for that pixel..
  3. There is a scheduler for the class-wise small loss selection threshold going from 0.5 to 0.9 (via linear ramp-up over 50 epochs), tau stays constant.
  4. Yes, I'll try to keep it in mind and write the code s.t. we can easily calculate running statistics per marker.
JLrumberger commented 2 years ago

It looks like doing the loss selection on the cell-level instead of the pixel-level will slow down training considerably. I need to do more tests here and probably will implement both versions, pixel-level and cell-level selection.