Erotemic / shitspotter

An open source algorithm and dataset for finding poop in pictures. A work in progress.
45 stars 4 forks source link

Any Use for Adding Images from the Internet? #19

Open njho opened 4 months ago

njho commented 4 months ago

Curious if there would be any benefit from using images from the internet. That being said, I'm not sure if there would be any issues w/ licensing.

The Poo detector using YoloX-tiny works, but I think adding more data might be useful. Would you be interested in having annotations from the web added? Or is that against the ethos of the project?

Erotemic commented 4 months ago

If images are appropriately licensed, then I would like to include them. I also would want to include attribution regardless, so we will have to track that.

You got the yolo model working? Excellent! I've been getting some decent results as well. Any chance you'd like to share the model/weights/inference code? I would like to start a leaderboard with quantified results. It would be interesting to see how our models compare. I'll have to postprocess my results to get boxes, but if you have a script that can produce boxes given a kwcoco file, then I (or you) can score it with kwcoco eval, which currently includes bounding box detection metrics.

njho commented 4 months ago

Yeah! For sure, I'll let you know when it's done. I said it was working, but it was detecting both poos and dogs, but the labels were incorrect. But somehow it was getting the bboxes! I'm using mmdetect, thinking it'd give me a head start, but the abstraction levels isn't making it clear to me where the configuration is incorrectly set...

Lots of nested configurations - who knows might just be me 😢

Either way tomorrow just going to go straight to the OG YoloX which I probably should've done in the first place. I suspect I'll have better luck there and will post an update shortly thereafter!

njho commented 4 months ago

Btw, what are your thoughts on these abstraction layers vs going to the original GH repo in general?

You might have more experience. Ive deployed a fair number for inference but this is only the second model I've trained. First being one for tabular data, but it was much simpler and I used FastAi as an abstraction layer.

I'm more or less scared, or hesitant to "do it myself" without the help of abstraction layers cause I'm worried about hyperparameter optimization, and that a lot of knowledge might be necessary to get it to train.

Erotemic commented 4 months ago

I do like mmdetect's construction of torch models, but I'm not a fan of the train mechanisms, but I do think they have really good modular definitions of torch model components. Their registration system isn't ammenable to static analysis, but otherwise its very good. I should probably learn how to use it, given that it does work and this rgb coco dataset would probably be accepted as valid input to its training / prediction algorithms (perhaps needing some minor variation in kwcoco conform).

Overall I think abstraction layers are important, but I think existing ones are flawed, which is something I'm trying to address in geowatch (I may change the name in the future), which uses pytorch-lightning as its training workhorse. I think Lightning is a great library, and I've written geowatch as a system that connects kwcoco annotations to a training pipeline. I'd love to (and might be able to) get mmdetect models integrated as a an option for the model parameter in the lightning training config. What this will require is something that connects my non-standard nested data structure (which I need to codify) to handle heterogeneous multimodal data to a mmdet model and loss function.

To illustrate the style of configuration I'm going for, this is a snippet from train.sh where I keep my "training invocations".

The following specifies everything about the training problem and gives you full control --- and logging --- of all of the hyperparameters. The main components are "model", "data", "optimizer", "lr_scheduler", "trainer", and "initializer" (the last of which is a custom geowatch thing). You will notice it accomplishing these tasks:

I also want to point out that the "initializer" section and some of this ease-of-config stuff is due to custom extensions I made to pytorch-lightning and jsonargparse. It is not vanilla lightning, but it is close to it. The initializer lets you pass a starting point, which can be another similar model. It uses my partial-weight-loading stuff in torch-liberator to cram weights from one model into another.

# -------------------------
# Tune network at full resolution with updated data from halfres

export CUDA_VISIBLE_DEVICES=0,1
DVC_DATA_DPATH=$HOME/data/dvc-repos/shitspotter_dvc
DVC_EXPT_DPATH=$HOME/data/dvc-repos/shitspotter_expt_dvc
WATCH_DVC_EXPT_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware='auto')
WORKDIR=$DVC_EXPT_DPATH/training/$HOSTNAME/$USER

DATASET_CODE=ShitSpotter
KWCOCO_BUNDLE_DPATH=$DVC_DATA_DPATH

TRAIN_FPATH=$KWCOCO_BUNDLE_DPATH/train.kwcoco.zip
VALI_FPATH=$KWCOCO_BUNDLE_DPATH/vali.kwcoco.zip

inspect_kwcoco_files(){
    kwcoco stats "$TRAIN_FPATH" "$VALI_FPATH"
    kwcoco info "$VALI_FPATH" -g 1
    kwcoco info "$VALI_FPATH" -v 1
    #kwcoco info "$VALI_FPATH" -a 1
    #geowatch stats "$TRAIN_FPATH" "$VALI_FPATH"
}
#inspect_kwcoco_files
EXPERIMENT_NAME="shitspotter_fromv29_fullres_v30"

CHANNELS="phone:(red|green|blue)"
DEFAULT_ROOT_DIR=$WORKDIR/$DATASET_CODE/runs/$EXPERIMENT_NAME
TARGET_LR=3e-4
WEIGHT_DECAY=$(python -c "print($TARGET_LR * 0.01)")
PERTERB_SCALE=$(python -c "print($TARGET_LR * 0.003)")
ETA_MIN=$(python -c "print($TARGET_LR * 0.0001)")
DEVICES=$(python -c "if 1:
    import os
    n = len(os.environ.get('CUDA_VISIBLE_DEVICES', '').split(','))
    print(','.join(list(map(str, range(n)))) + ',')
")
ACCELERATOR=gpu
STRATEGY=$(python -c "if 1:
    import os
    n = len(os.environ.get('CUDA_VISIBLE_DEVICES', '').split(','))
    print('ddp' if n > 1 else 'auto')
")
DDP_WORKAROUND=$(python -c "if 1:
    import os
    n = len(os.environ.get('CUDA_VISIBLE_DEVICES', '').split(','))
    print(int(n > 1))
")
echo "DEVICES = $DEVICES"
echo "DDP_WORKAROUND = $DDP_WORKAROUND"
echo "WEIGHT_DECAY = $WEIGHT_DECAY"

MAX_STEPS=10240
MAX_EPOCHS=1280
TRAIN_BATCHES_PER_EPOCH=1024
ACCUMULATE_GRAD_BATCHES=128
BATCH_SIZE=2
TRAIN_ITEMS_PER_EPOCH=$(python -c "print($TRAIN_BATCHES_PER_EPOCH * $BATCH_SIZE)")
echo "TRAIN_ITEMS_PER_EPOCH = $TRAIN_ITEMS_PER_EPOCH"

python -m geowatch.cli.experimental.recommend_size_adjustments \
    --MAX_STEPS=$MAX_STEPS \
    --MAX_EPOCHS=$MAX_EPOCHS \
    --BATCH_SIZE=$BATCH_SIZE \
    --ACCUMULATE_GRAD_BATCHES=$ACCUMULATE_GRAD_BATCHES \
    --TRAIN_BATCHES_PER_EPOCH="$TRAIN_BATCHES_PER_EPOCH" \
    --TRAIN_ITEMS_PER_EPOCH="$TRAIN_ITEMS_PER_EPOCH"

# Find the most recent checkpoint (TODO add utility for this)
PREV_CHECKPOINT=$(python -m geowatch.cli.experimental.find_recent_checkpoint --default_root_dir="$DEFAULT_ROOT_DIR")
echo "PREV_CHECKPOINT = $PREV_CHECKPOINT"

DDP_WORKAROUND=$DDP_WORKAROUND python -m geowatch.tasks.fusion fit --config "
data:
    select_videos          : $SELECT_VIDEOS
    num_workers            : 0
    train_dataset          : $TRAIN_FPATH
    vali_dataset           : $VALI_FPATH
    window_dims            : '416,416'
    time_steps             : 1
    time_sampling          : uniform
    #time_kernel            : '[0.0s,]'
    window_resolution     : 1.0
    input_resolution      : 1.0
    output_resolution     : 1.0
    neg_to_pos_ratio       : 1.0
    batch_size             : $BATCH_SIZE
    normalize_perframe     : false
    normalize_peritem      : false
    max_items_per_epoch    : $TRAIN_ITEMS_PER_EPOCH
    channels               : '$CHANNELS'
    min_spacetime_weight   : 0.6
    temporal_dropout_rate  : 0.5
    channel_dropout_rate   : 0.5
    modality_dropout_rate  : 0.5
    temporal_dropout       : 0.0
    channel_dropout        : 0.05
    modality_dropout       : 0.05
    mask_low_quality       : False
    mask_samecolor_method  : None
    observable_threshold   : 0.0
    quality_threshold      : 0.0
    weight_dilate          : 5
    dist_weights           : False
    use_centered_positives : True
    use_grid_positives     : True
    use_grid_negatives     : True
    normalize_inputs       : 80960
    balance_areas          : false
model:
    class_path: MultimodalTransformer
    init_args:
        saliency_weights       : null
        class_weights          : 'auto'
        tokenizer              : linconv
        arch_name              : smt_it_stm_s12
        decoder                : mlp
        positive_change_weight : 1
        negative_change_weight : 0.01
        stream_channels        : 16
        class_loss             : 'dicefocal'
        saliency_loss          : 'focal'
        saliency_head_hidden   : 4
        change_head_hidden     : 6
        class_head_hidden      : 6
        global_change_weight   : 0.00
        global_class_weight    : 0.00
        global_saliency_weight : 1.00
        multimodal_reduce      : max
        continual_learning     : false
        perterb_scale          : $PERTERB_SCALE
optimizer:
    class_path: torch.optim.AdamW
    init_args:
        lr           : $TARGET_LR
        weight_decay : $WEIGHT_DECAY
lr_scheduler:
  class_path: torch.optim.lr_scheduler.OneCycleLR
  init_args:
    max_lr: $TARGET_LR
    total_steps: $MAX_STEPS
    anneal_strategy: cos
    pct_start: 0.3
trainer:
    accumulate_grad_batches: $ACCUMULATE_GRAD_BATCHES
    default_root_dir     : $DEFAULT_ROOT_DIR
    accelerator          : $ACCELERATOR
    devices              : $DEVICES
    strategy             : $STRATEGY
    limit_train_batches  : $TRAIN_BATCHES_PER_EPOCH
    limit_val_batches    : 2056
    log_every_n_steps    : 1
    check_val_every_n_epoch: 1
    enable_checkpointing: true
    enable_model_summary: true
    num_sanity_val_steps : 0
    max_epochs: $MAX_EPOCHS
    callbacks:
        - class_path: pytorch_lightning.callbacks.ModelCheckpoint
          init_args:
              monitor: val_loss
              mode: min
              save_top_k: 5
              filename: '{epoch:04d}-{step:06d}-{val_loss:.3f}.ckpt'
              save_last: true

torch_globals:
    float32_matmul_precision: auto

initializer:
    init: /data/joncrall/dvc-repos/shitspotter_expt_dvc/training/toothbrush/joncrall/ShitSpotter/runs/shitspotter_scratch_halfres_v029/lightning_logs/version_1/checkpoints/epoch=1156-step=004628-val_loss=0.008.ckpt.ckpt
"
#--ckpt_path="$PREV_CHECKPOINT"

The main way to deal with hyperparams for new problems is to search. My code isn't hooked up to a hyperparam optimizer yet, but I'm looking at things like ray.tune and deephyper. What I do have is a way to to build, store, and measure quality metrics and resource consumption and tie those directly to results and configuration of a pipeline. In other words, I can build a big table that maps hyperparameters to model scores, and I need to feed those to a hyperopt library and get a prediction for which one to try next.

njho commented 4 months ago

You got the yolo model working? Excellent! I've been getting some decent results as well. Any chance you'd like to share the model/weights/inference code?

Can do. I've trained a tiny and a small model for 100 epochs which I'm happy to publish. That being said, I could've kept training probably, as my COCOAP50 was still improving on validation, with loss decreasing. As well, my model is trained on both dogs and poo. Should I only train it on poo?

When benchmarking do you know you normalize for the number of epochs between models? Or should I have just kept training until I saw overfitting? As well? How do you normalize between the different sizes of models?

From their documentation:

Model size Params(M) FLOPs(G)
YOLOX-s 640 9.0 26.8
YOLOX-Nano 416 0.91 1.08 github

I can do a PR after hearing your thoughts!

Erotemic commented 4 months ago

When benchmarking do you know you normalize for the number of epochs between models? Or should I have just kept training until I saw overfitting? As well? How do you normalize between the different sizes of models?

When benchmarking do you know you normalize for the number of epochs between models? Or should I have just kept training until I saw overfitting? As well? How do you normalize between the different sizes of models?

I think it's important to compare models with respect to some train/inference budget. We fundamentally don't know if a model is in one of the best minimas, or if it can reasonably achieve one. I use pytorch lightning's ModelCheckpoint callback to keep the top 5 validation loss checkpoints, as well as the final one:

        - class_path: pytorch_lightning.callbacks.ModelCheckpoint
          init_args:
              monitor: val_loss
              mode: min
              save_top_k: 5
              filename: '{epoch}-{step}-{val_loss:.3f}.ckpt'
              save_last: true

I then will run evaluations on this shortlist of models to measure inference time and quality.

Should I only train it on poo?

Conventional wisdom says that that will have a higher score on poo-only test data. But I feel like multi-task objectives must have a higher score ceiling; it's just a matter of tuning the hyperparams. I think either way you can get a fine model.

If you haven't seen it, take a look at google's deep learning playbook.

I can do a PR after hearing your thoughts!

:partying_face: