GoogleCloudPlatform / llm-pipeline-examples

Apache License 2.0
107 stars 26 forks source link

Add /infer:raw endpoint to Triton predict image #30

Closed Chris113113 closed 1 year ago

Chris113113 commented 1 year ago

re-pasting description:

This exposes bypassing the pre/post processing in the Flask app. Requests will be directly forwarded to Triton.

Example input: {"inputs": [{"name": "input_ids", "shape": [1, 298], "datatype": "UINT32", "data": [30425, 15, 26, 344, 3, 9, 511, 18, 2894, 26658, 11, 1368, 1814, 16, 9702, 4463, 31, 7, 12216, 120, 3542, 425, 63, 16432, 13, 31005, 19, 3, 9, 1001, 1976, 1183, 3, 31, 382, 13306, 9627, 23, 1636, 2263, 8810, 6, 6876, 3850, 3, 867, 3022, 5, 31, 37, 260, 322, 65, 582, 80, 13, 9702, 4463, 31, 7, 167, 6812, 81, 542, 8752, 7, 437, 2101, 16, 1797, 6503, 5, 275, 16, 165, 5103, 1228, 6, 8507, 19, 1577, 3, 867, 3022, 11104, 114, 150, 80, 1307, 5, 2263, 14973, 16, 1001, 18156, 30, 8, 18654, 1228, 3485, 33, 469, 31, 7, 931, 10, 11888, 63, 3, 18852, 295, 9, 11549, 2495, 41, 115, 13668, 15, 26, 201, 12, 7398, 15, 11, 3, 7, 4310, 524, 4079, 17, 3, 1436, 109, 3068, 18, 26, 9889, 23616, 1442, 7, 11, 17526, 6, 2148, 1054, 3293, 15, 17, 3, 6, 12146, 115, 28679, 6, 27577, 7, 11, 1667, 7, 15, 5, 3, 3626, 163, 11104, 19771, 12, 8, 3850, 10829, 6, 2846, 172, 1024, 31, 7, 3, 867, 3022, 65, 582, 8, 1464, 190, 84, 3, 88, 19, 6273, 2375, 53, 8, 8109, 300, 3850, 542, 5, 3, 31, 3713, 41, 159, 61, 3, 867, 3022, 21, 82, 4810, 6, 21, 119, 151, 31, 7, 14246, 6, 31, 8507, 817, 7, 19602, 5, 3, 31, 196, 317, 8, 41, 14063, 138, 61, 542, 733, 744, 31, 17, 43, 231, 628, 21, 2648, 3, 233, 3, 3227, 62, 31, 60, 479, 44, 8, 8165, 800, 13, 3850, 542, 6, 31, 3, 88, 617, 7, 5, 3, 31, 196, 31, 51, 59, 1119, 12, 3958, 12, 8, 1252, 8084, 1636, 27, 31, 51, 1119, 12, 199, 1589, 26203, 777, 70, 1543, 30, 3, 9, 72, 1646, 1873, 5, 31, 1]}, {"name": "sequence_length", "shape": [1, 1], "datatype": "UINT32", "data": [298]}, {"name": "max_output_len", "shape": [1, 1], "datatype": "UINT32", "data": [128]}, {"name": "runtime_top_k", "shape": [1, 1], "datatype": "UINT32", "data": [1]}], "parameters": {"binary_data_output": false}}

Example output: { "model_name": "fastertransformer", "model_version": "1", "outputs": [ { "name": "output_ids", "datatype": "INT32", "shape": [ 1, 1, 128 ], "data": [ 37, 3, 867, 3022, 260, 322, 16, 9702, 4463, 31, 7, 31005, 65, 582, 80, 13, 9702, 4463, 31, 7, 167, 6812, 81, 542, 8752, 7, 437, 2101, 16, 1797, 6503, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] }, { "name": "sequence_length", "datatype": "INT32", "shape": [ 1, 1 ], "data": [ 33 ] } ] }

Chris113113 commented 1 year ago

Closing per discussion