eto-ai / rikai

Parquet-based ML data format optimized for working with unstructured data
https://rikai.readthedocs.io/en/latest/
Apache License 2.0
138 stars 19 forks source link

ModelType Design: explicit schema pruning using RETURNS #557

Open da-liii opened 2 years ago

da-liii commented 2 years ago
CREATE (OR REPLACE)? MODEL (IF NOT EXISTS)? model=qualifiedName
      (FLAVOR flavor=identifier)?
      (MODEL_TYPE modeltype=qualifiedName)?
      (OPTIONS optionList)?
      (RETURNS datatype=dataType)?
      (USING uri=STRING)

For example, we define

array<struct<box:box2d, score:float, label:string>>

as OUTPUT_SCHEMA.

When

create model yolov5
RETURNS array<struct<score:float, label:string>>
using 'torchhub://x/y/z'

schema pruning can be done.

changhiskhan commented 2 years ago

Another related here that could be super useful is to specify results to get from certain layers. Right now if we want to do model introspection we need to specifically implement a new model explicitly. Instead it would be great to be able to extract features from intermediate layers without reimplementing (eg embeddings, candidate label scores etc). Cc @eddyxu wdyt?

eddyxu commented 2 years ago

schema pruning can be done.

Could you elaborate what is the use case for schema pruning? It is not clear to me why we want to do it instead just do data manipulation via SQL. It does add quite some complexity to ModelType, and makes the ModelType interface wider.

Another related here that could be super useful is to specify results to get from certain layers.

Yea, this could be great. Lets do some research to see whether we can achieve a common pattern here?