TIBCOSoftware / catalystml

CatalystML is an open source specification for real-time feature processing, purpose built to transform data for machine learning models.
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

Overwrite object labels, and overwriting components of label #13

Closed abramvandergeest closed 5 years ago

abramvandergeest commented 5 years ago

@fm-tibco @mellistibco @skothari-tibco

So I was creating a demo for healthcare (it is currently in the demos folder). To minimize the memory foot print of all the labeled objects I reused a label for an operation in the following operation (which has been discussed before and is in the spec). However the new thing is, I realized that it would be enormously useful to be useful to overwrite a component of a pre-exiting label. For example, the label 'datatemp' is previously defined as a map with values of arrays. I then run an operation that takes one of those arrays and normalizes them. I then want to replace the old column with the new normalized one. Which I refer to in the demo like this: "datatemp['age']" for the id of the new operation. Does this seem reasonable?

fm-tibco commented 5 years ago

can you put a sample json of what you are describing?

fm-tibco commented 5 years ago

samip gave me a quick description, but the use of 'id' for this makes it confusing.. maybe we should consider changing the spec to the following:

"pipeline": [
            {
              "operation": "math",
              "params": {
                "sample":2,
                "listOfKeys":["0_0","1_0","2_0","amag_0","0_1","1_1","2_1","amag_1"]
              },
              "input": {
                "inputSample": "$input"
              },
              "output":"math1"
            },
            {
              "operation": "math",
              "params": {
                "sample":2
              },
              "input": {
                "inputSample": "98",
                "inputMap": "$math1"
              },
              "output" : "math1",
            },
            {
              "operation": "math",
              "params": {
                "sample":3 
              },
              "input": {
                "inputSample": "65",
                "inputMap": "$math1"
              },
              "output" : "math1['sample']"
            },
            {
              "operation": "math",
              "params": {
                "sample":3 
              },
              "input": {
                "inputSample": "$math1['sample']",
                "inputMap": "$math1"
              }
              "output":"math2"
            }
          ],
abramvandergeest commented 5 years ago

@fm-tibco you are right, I also renamed the current definition of output to output type. This is included in https://github.com/TIBCOSoftware/featureprep/pull/12