Composable Models: Object detection stage

Discussed in https://github.com/broadinstitute/superurop-log/discussions/8

^{Originally posted by **gnodar01** November 30, 2023} Over the last week or so [we've been discussing](https://github.com/broadinstitute/superurop-log/discussions/7#discussioncomment-7579909) going forward with composable models. The rough idea is to have individual components of an instance segmentation model, that are pre-trained on some task such as imagenet or coco. One component might be an object detector (probably single-shot such as yolo or ssd) which outputs boundbox coordinates (and classes, but we may not even need those). Another component might be a semantic segmenter, which takes in crops defined by the bounding boxes of the object detector, and semantically segments into bg/fg. The hypothesis is, given pretrained components of the sort described above, can we add on a relatively slim number of layers on each, to have a fine-tunable model? If so, we can have a composable pipline that goes `images -> object detector (pretrained, inference only) -> slim layers (trainable) -> output bbox coords -> image crops -> semantic segmenter (pretrained, inference only) -> slim layers (trainable) -> instances` In the end this pipeline would be deployed to the browser. The object detector and semantic segmenter would be pretrained in python on a laptop or DGX, and then converted to a tfjs graph model. We have determined that is possible with more or less arbitrary tensorflow using [tfjs convertor](https://github.com/tensorflow/tfjs/tree/master/tfjs-converter) or pytorch models using the [ultralytics Exporter class](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/engine/exporter.py#L127) (which does `pytorch -> onnx -> tensorflow saved model -> tfjs converter -> tfj graph model`). The slim layers would be tfjs layers models, and therefore trainable in-browser. For the purposes of validating the hypothesis first however, we'll do it all in python, as a proof of concept. We can worry about converting to tfjs and verifying its performance in the browser after.

broadinstitute / superurop-log

Composable Models: Object detection stage #12

Discussed in https://github.com/broadinstitute/superurop-log/discussions/8