Open njho opened 4 months ago
If images are appropriately licensed, then I would like to include them. I also would want to include attribution regardless, so we will have to track that.
You got the yolo model working? Excellent! I've been getting some decent results as well. Any chance you'd like to share the model/weights/inference code? I would like to start a leaderboard with quantified results. It would be interesting to see how our models compare. I'll have to postprocess my results to get boxes, but if you have a script that can produce boxes given a kwcoco file, then I (or you) can score it with kwcoco eval
, which currently includes bounding box detection metrics.
Yeah! For sure, I'll let you know when it's done. I said it was working, but it was detecting both poos
and dogs
, but the labels were incorrect. But somehow it was getting the bboxes! I'm using mmdetect, thinking it'd give me a head start, but the abstraction levels isn't making it clear to me where the configuration is incorrectly set...
Lots of nested configurations - who knows might just be me 😢
Either way tomorrow just going to go straight to the OG YoloX which I probably should've done in the first place. I suspect I'll have better luck there and will post an update shortly thereafter!
Btw, what are your thoughts on these abstraction layers vs going to the original GH repo in general?
You might have more experience. Ive deployed a fair number for inference but this is only the second model I've trained. First being one for tabular data, but it was much simpler and I used FastAi as an abstraction layer.
I'm more or less scared, or hesitant to "do it myself" without the help of abstraction layers cause I'm worried about hyperparameter optimization, and that a lot of knowledge might be necessary to get it to train.
I do like mmdetect's construction of torch models, but I'm not a fan of the train mechanisms, but I do think they have really good modular definitions of torch model components. Their registration system isn't ammenable to static analysis, but otherwise its very good. I should probably learn how to use it, given that it does work and this rgb coco dataset would probably be accepted as valid input to its training / prediction algorithms (perhaps needing some minor variation in kwcoco conform
).
Overall I think abstraction layers are important, but I think existing ones are flawed, which is something I'm trying to address in geowatch (I may change the name in the future), which uses pytorch-lightning
as its training workhorse. I think Lightning is a great library, and I've written geowatch as a system that connects kwcoco annotations to a training pipeline. I'd love to (and might be able to) get mmdetect models integrated as a an option for the model
parameter in the lightning training config. What this will require is something that connects my non-standard nested data structure (which I need to codify) to handle heterogeneous multimodal data to a mmdet model and loss function.
To illustrate the style of configuration I'm going for, this is a snippet from train.sh where I keep my "training invocations".
The following specifies everything about the training problem and gives you full control --- and logging --- of all of the hyperparameters. The main components are "model", "data", "optimizer", "lr_scheduler", "trainer", and "initializer" (the last of which is a custom geowatch thing). You will notice it accomplishing these tasks:
I also want to point out that the "initializer" section and some of this ease-of-config stuff is due to custom extensions I made to pytorch-lightning and jsonargparse. It is not vanilla lightning, but it is close to it. The initializer lets you pass a starting point, which can be another similar model. It uses my partial-weight-loading stuff in torch-liberator to cram weights from one model into another.
# -------------------------
# Tune network at full resolution with updated data from halfres
export CUDA_VISIBLE_DEVICES=0,1
DVC_DATA_DPATH=$HOME/data/dvc-repos/shitspotter_dvc
DVC_EXPT_DPATH=$HOME/data/dvc-repos/shitspotter_expt_dvc
WATCH_DVC_EXPT_DPATH=$(geowatch_dvc --tags='phase2_expt' --hardware='auto')
WORKDIR=$DVC_EXPT_DPATH/training/$HOSTNAME/$USER
DATASET_CODE=ShitSpotter
KWCOCO_BUNDLE_DPATH=$DVC_DATA_DPATH
TRAIN_FPATH=$KWCOCO_BUNDLE_DPATH/train.kwcoco.zip
VALI_FPATH=$KWCOCO_BUNDLE_DPATH/vali.kwcoco.zip
inspect_kwcoco_files(){
kwcoco stats "$TRAIN_FPATH" "$VALI_FPATH"
kwcoco info "$VALI_FPATH" -g 1
kwcoco info "$VALI_FPATH" -v 1
#kwcoco info "$VALI_FPATH" -a 1
#geowatch stats "$TRAIN_FPATH" "$VALI_FPATH"
}
#inspect_kwcoco_files
EXPERIMENT_NAME="shitspotter_fromv29_fullres_v30"
CHANNELS="phone:(red|green|blue)"
DEFAULT_ROOT_DIR=$WORKDIR/$DATASET_CODE/runs/$EXPERIMENT_NAME
TARGET_LR=3e-4
WEIGHT_DECAY=$(python -c "print($TARGET_LR * 0.01)")
PERTERB_SCALE=$(python -c "print($TARGET_LR * 0.003)")
ETA_MIN=$(python -c "print($TARGET_LR * 0.0001)")
DEVICES=$(python -c "if 1:
import os
n = len(os.environ.get('CUDA_VISIBLE_DEVICES', '').split(','))
print(','.join(list(map(str, range(n)))) + ',')
")
ACCELERATOR=gpu
STRATEGY=$(python -c "if 1:
import os
n = len(os.environ.get('CUDA_VISIBLE_DEVICES', '').split(','))
print('ddp' if n > 1 else 'auto')
")
DDP_WORKAROUND=$(python -c "if 1:
import os
n = len(os.environ.get('CUDA_VISIBLE_DEVICES', '').split(','))
print(int(n > 1))
")
echo "DEVICES = $DEVICES"
echo "DDP_WORKAROUND = $DDP_WORKAROUND"
echo "WEIGHT_DECAY = $WEIGHT_DECAY"
MAX_STEPS=10240
MAX_EPOCHS=1280
TRAIN_BATCHES_PER_EPOCH=1024
ACCUMULATE_GRAD_BATCHES=128
BATCH_SIZE=2
TRAIN_ITEMS_PER_EPOCH=$(python -c "print($TRAIN_BATCHES_PER_EPOCH * $BATCH_SIZE)")
echo "TRAIN_ITEMS_PER_EPOCH = $TRAIN_ITEMS_PER_EPOCH"
python -m geowatch.cli.experimental.recommend_size_adjustments \
--MAX_STEPS=$MAX_STEPS \
--MAX_EPOCHS=$MAX_EPOCHS \
--BATCH_SIZE=$BATCH_SIZE \
--ACCUMULATE_GRAD_BATCHES=$ACCUMULATE_GRAD_BATCHES \
--TRAIN_BATCHES_PER_EPOCH="$TRAIN_BATCHES_PER_EPOCH" \
--TRAIN_ITEMS_PER_EPOCH="$TRAIN_ITEMS_PER_EPOCH"
# Find the most recent checkpoint (TODO add utility for this)
PREV_CHECKPOINT=$(python -m geowatch.cli.experimental.find_recent_checkpoint --default_root_dir="$DEFAULT_ROOT_DIR")
echo "PREV_CHECKPOINT = $PREV_CHECKPOINT"
DDP_WORKAROUND=$DDP_WORKAROUND python -m geowatch.tasks.fusion fit --config "
data:
select_videos : $SELECT_VIDEOS
num_workers : 0
train_dataset : $TRAIN_FPATH
vali_dataset : $VALI_FPATH
window_dims : '416,416'
time_steps : 1
time_sampling : uniform
#time_kernel : '[0.0s,]'
window_resolution : 1.0
input_resolution : 1.0
output_resolution : 1.0
neg_to_pos_ratio : 1.0
batch_size : $BATCH_SIZE
normalize_perframe : false
normalize_peritem : false
max_items_per_epoch : $TRAIN_ITEMS_PER_EPOCH
channels : '$CHANNELS'
min_spacetime_weight : 0.6
temporal_dropout_rate : 0.5
channel_dropout_rate : 0.5
modality_dropout_rate : 0.5
temporal_dropout : 0.0
channel_dropout : 0.05
modality_dropout : 0.05
mask_low_quality : False
mask_samecolor_method : None
observable_threshold : 0.0
quality_threshold : 0.0
weight_dilate : 5
dist_weights : False
use_centered_positives : True
use_grid_positives : True
use_grid_negatives : True
normalize_inputs : 80960
balance_areas : false
model:
class_path: MultimodalTransformer
init_args:
saliency_weights : null
class_weights : 'auto'
tokenizer : linconv
arch_name : smt_it_stm_s12
decoder : mlp
positive_change_weight : 1
negative_change_weight : 0.01
stream_channels : 16
class_loss : 'dicefocal'
saliency_loss : 'focal'
saliency_head_hidden : 4
change_head_hidden : 6
class_head_hidden : 6
global_change_weight : 0.00
global_class_weight : 0.00
global_saliency_weight : 1.00
multimodal_reduce : max
continual_learning : false
perterb_scale : $PERTERB_SCALE
optimizer:
class_path: torch.optim.AdamW
init_args:
lr : $TARGET_LR
weight_decay : $WEIGHT_DECAY
lr_scheduler:
class_path: torch.optim.lr_scheduler.OneCycleLR
init_args:
max_lr: $TARGET_LR
total_steps: $MAX_STEPS
anneal_strategy: cos
pct_start: 0.3
trainer:
accumulate_grad_batches: $ACCUMULATE_GRAD_BATCHES
default_root_dir : $DEFAULT_ROOT_DIR
accelerator : $ACCELERATOR
devices : $DEVICES
strategy : $STRATEGY
limit_train_batches : $TRAIN_BATCHES_PER_EPOCH
limit_val_batches : 2056
log_every_n_steps : 1
check_val_every_n_epoch: 1
enable_checkpointing: true
enable_model_summary: true
num_sanity_val_steps : 0
max_epochs: $MAX_EPOCHS
callbacks:
- class_path: pytorch_lightning.callbacks.ModelCheckpoint
init_args:
monitor: val_loss
mode: min
save_top_k: 5
filename: '{epoch:04d}-{step:06d}-{val_loss:.3f}.ckpt'
save_last: true
torch_globals:
float32_matmul_precision: auto
initializer:
init: /data/joncrall/dvc-repos/shitspotter_expt_dvc/training/toothbrush/joncrall/ShitSpotter/runs/shitspotter_scratch_halfres_v029/lightning_logs/version_1/checkpoints/epoch=1156-step=004628-val_loss=0.008.ckpt.ckpt
"
#--ckpt_path="$PREV_CHECKPOINT"
The main way to deal with hyperparams for new problems is to search. My code isn't hooked up to a hyperparam optimizer yet, but I'm looking at things like ray.tune and deephyper. What I do have is a way to to build, store, and measure quality metrics and resource consumption and tie those directly to results and configuration of a pipeline. In other words, I can build a big table that maps hyperparameters to model scores, and I need to feed those to a hyperopt library and get a prediction for which one to try next.
You got the yolo model working? Excellent! I've been getting some decent results as well. Any chance you'd like to share the model/weights/inference code?
Can do. I've trained a tiny
and a small
model for 100 epochs which I'm happy to publish. That being said, I could've kept training probably, as my COCOAP50 was still improving on validation, with loss decreasing. As well, my model is trained on both dogs
and poo
. Should I only train it on poo?
When benchmarking do you know you normalize for the number of epochs between models? Or should I have just kept training until I saw overfitting? As well? How do you normalize between the different sizes of models?
From their documentation:
Model | size | Params(M) | FLOPs(G) | |
---|---|---|---|---|
YOLOX-s | 640 | 9.0 | 26.8 | |
YOLOX-Nano | 416 | 0.91 | 1.08 | github |
I can do a PR after hearing your thoughts!
When benchmarking do you know you normalize for the number of epochs between models? Or should I have just kept training until I saw overfitting? As well? How do you normalize between the different sizes of models?
When benchmarking do you know you normalize for the number of epochs between models? Or should I have just kept training until I saw overfitting? As well? How do you normalize between the different sizes of models?
I think it's important to compare models with respect to some train/inference budget. We fundamentally don't know if a model is in one of the best minimas, or if it can reasonably achieve one. I use pytorch lightning's ModelCheckpoint callback to keep the top 5 validation loss checkpoints, as well as the final one:
- class_path: pytorch_lightning.callbacks.ModelCheckpoint
init_args:
monitor: val_loss
mode: min
save_top_k: 5
filename: '{epoch}-{step}-{val_loss:.3f}.ckpt'
save_last: true
I then will run evaluations on this shortlist of models to measure inference time and quality.
Should I only train it on poo?
Conventional wisdom says that that will have a higher score on poo-only test data. But I feel like multi-task objectives must have a higher score ceiling; it's just a matter of tuning the hyperparams. I think either way you can get a fine model.
If you haven't seen it, take a look at google's deep learning playbook.
I can do a PR after hearing your thoughts!
:partying_face:
Curious if there would be any benefit from using images from the internet. That being said, I'm not sure if there would be any issues w/ licensing.
The Poo detector using YoloX-tiny works, but I think adding more data might be useful. Would you be interested in having annotations from the web added? Or is that against the ethos of the project?