Model Type with Pretrained Mixin

eto-ai / rikai

Parquet-based ML data format optimized for working with unstructured data

https://rikai.readthedocs.io/en/latest/

Apache License 2.0

136 stars 19 forks source link

Model Type with Pretrained Mixin #579

Closed da-tubi closed 2 years ago

da-tubi commented 2 years ago

Second attempt to close #560

Pretrained Model Type is designed for the development of ModelType. Just like TFHub and TorchHub support, it is to make Rikai much more user-friendly. The difference is:

TFHub/TorchHub is designed for first-time Rikai SQL users
Pretrained Model Type is for Rikai users who cares about both SQL and Python

There are several different ways to load a pretrained model:

The cannonical way: eg. TFHub/TorchHub hub.load('path/to/model')
The python way: eg. torchvision.models.detection.fasterrcnn_resnet50_fpn()
Others: eg. torch.load("https://path/to/model.pt")
onnx: eg. onnx.load("xxx.onnx")
More to come

There is already a nice abstraction for the first case. For the other cases, pretrained ModelType could and should be used.

Well, with pretrained ModelType, we might need a simplest baseline/dummy flavor for #578 .

da-liii commented 2 years ago

With this PR merged, we do not need to add MLflow support when working on #580 and #564 .

And I will also create a demo rikai intergration with easyocr. https://github.com/darcy-shen/rikai-easyocr/wiki

da-tubi commented 2 years ago

https://github.com/eto-ai/rikai/pull/579/files#diff-5fd91b22b7c04a9dd0cce51026be4b084dacc8dd202a0a3f3a1eaf36e2ca8900

The find_model auxiliary method is not a good solution. We may improve it later.

Renkai commented 2 years ago

I made an example of how we can move the template code into the mixin trait, not sure if we can do a similar thing in Python, paste it here for potential reference from the future.

object MixinExample {
  trait Model

  class DummyModel extends Model

  class OtherModel extends Model

  abstract class ModelType {
    def find_model(model: Model): Unit
  }

  trait Pretrained extends ModelType {

    def pretrained_model(): Model

    abstract override def find_model(model: Model): Unit = {
      if (model.isInstanceOf[DummyModel]) {
        super.find_model(pretrained_model())
      } else {
        super.find_model(model)
      }
    }
  }

  class AModelType extends ModelType {

    override def find_model(model: Model): Unit = {
      println("call AModelType find model")
    }
  }

  class PretrainedAModelType extends AModelType with Pretrained {
    override def pretrained_model(): Model = {
      println("call pre-trained model")
      new OtherModel
    }
  }

  def main(args: Array[String]): Unit = {
    val t = new PretrainedAModelType
    println("try call dummy model")
    t.find_model(new DummyModel)
    println("=" * 80)
    println("try call other model")
    t.find_model(new OtherModel)
  }
}

da-tubi commented 2 years ago

Here is the wip branch: https://github.com/da-tubi/rikai/tree/backup/onnx_test

Just tried the onnx version of ssd:

+    def pretrained_model(self) -> Any:
+        import onnx
+        from onnx_tf.backend import prepare
+        onnx_m = onnx.load("/tmp/ssd-12.onnx")
+        return prepare(onnx_m)
+
+    def load_model(self, spec: ModelSpec, **kwargs):
+        if isinstance(spec, DummyModelSpec):
+            self.model = self.pretrained_model()
+            self.spec = spec
+        else:
+            return super().load_model(spec, **kwargs)

+        results = self.model.run(images)

Using the tensorflow flavor, the input type does not match.

E                   TypeError: Value passed to parameter 'input' has DataType uint8 not in list of allowed values: float16, bfloat16, float32, float64, int32