KonduitAI / konduit-serving

Enterprise runtime for machine learning models
https://serving.konduit.ai/
Apache License 2.0
47 stars 15 forks source link

Unable to provide nested dictionary as input for pipeline steps. #42

Open ShamsUlAzeem opened 4 years ago

ShamsUlAzeem commented 4 years ago

Input names and output names have a different meaning when creating different pipeline step. For python pipeline step and transform pipeline step we need to specify data in a columnar format while for running network graphs they are treated as record rows and batches. For columnar data, as for the current state, we need to provide nested dictionary for inputs. For example,

{
  'default': {
     'x': np.ones(2,2),
     'y': np.ones(2,2)
  } 
}

For doing this we have to extend our konduit python module client and BatchInputParser to be able to differentiate between batches and nested columnar data.

agibsonccc commented 4 years ago

So I've thought about this a bit. The intended "use case" is columnar data. The original logic here was a "part" = "name" 1 to 1. What we could do instead is ensure this gets directed to columns instead.

The backend currently only has json support for columnar data: https://github.com/KonduitAI/konduit-serving/blob/b881d38c8fe04812748185b5f195d4a60c47a9bb/konduit-serving-core/src/main/java/ai/konduit/serving/configprovider/PipelineRouteDefiner.java#L212

So we'd either need to convert it to json internally or make multi part work with columns. I'm not sure of a case where we would need to do that...internally we could use arrow and then upload an arrow blob instead. Arrow supports tensors. In that case, the fix could be on the client side to detect the type,detect its columnar and do a redirect to a file upload. I feel like a redirect to using arrow internally would be best.