Add support for protected types in JSONToTuple

rohitsw commented 10 years ago

For examples this is a valid json { "rstring" : "mystring", "type" : "mytype" }

However there is no tuple structure that will allow for this json to be mapped since rstring and type are SPL protected names that cannot be used as attribute names in tuples. We need a way to have these type of attributes accessible.

hildrum commented 10 years ago

FWIW, I did this for the text toolkit.

rohitsw commented 10 years ago

@hildrum how did you implement it?

hildrum commented 10 years ago

It looks like it's not a documented feature. Oops, gotta update that.

I picked a quotePrefix protectreserved_. (This is clunky, but it was an unlikely case.) If you want to take something from the SystemT to a streams tuple, and the attribute in SystemT has a name that's a reserved word like "rstring", then you use a streams attribute named "protectreservedrstring", and the operator knows to strip off the "protectreserved" when determining how to populate the streams attribute. It only strips off one, I believe, so protectreserved could be used to protect an attribute beginning with protectreserved, though I didn't test that.

rohitsw commented 10 years ago

ok, i had something similar in the earlier version of the JSON operators. However I dont find that to be an elegant solution. I was thinking that the operator could accept parameters that could be in key=value format. Here key would be the JSON attribute name and the value would be its mapping to the corresponding tuple type. This should give more flexibility in attribute naming.

e.g. attributeMap : "rstring=myrstring";

Thoughts?

hildrum commented 10 years ago

I think that's a good, more general, solution. We have something similar proposed for HBASE: https://github.com/IBMStreams/streamsx.hbase/issues/23#issuecomment-41174439

ulemanstreaming commented 9 years ago

The attributeMap parameter approach sounds good. A related but slightly different mechanism is used in the Mining toolkit: Not a single parameter whose value is a map, but one operator parameter for each model parameter. I can't fully articulate the pros and cons of the two approaches, but I think the single parameter with a map value (the one proposed here) is cleaner. There's something weird about parameters that are not predefined in the operator model.

Last I checked, this feature is still missing. Is this likely to move forward? Who is maintaining this toolkit now?

hildrum commented 9 years ago

I'm the one maintaining the toolkit. But since this issue hadn't been commented on for a year, I didn't think there was anyone who could use this feature. Is this something you need?

ulemanstreaming commented 9 years ago

Recently I was building a demo using FAA airport data, which included an element called "type". I found myself having to massage the JSON string before converting it:

      stream<Records> FixedUpJSONRecords as O = Functor(JSONRecords as I)
      {
         output O:
            record = regexReplace(I.record, '"type":', '"delayType":', false);
      }

That works, but has workaround written all over it. I had used an earlier version of the toolkit back in 2013 and was wondering what had happened to the protected-prefix parameter I remembered seeing (but did not need at the time). To be honest, I failed to find this discussion thread.

So "need" is relative, but I've only used JSON parsing twice and needed something like it 50% of the time. (And no, I'm not taking that statistic seriously.) I'd say, it's a very nice-to-have.

hildrum commented 9 years ago

I'll try to get the feature in sometime soon, then. I think the way it'd work is that you'd specify attributeMap: "streamstype=jsontype"; (this is the reverse of what rohitsw proposed above) ie, in your example: attributeMap: "delayType=type";

ulemanstreaming commented 9 years ago

+1 for adding this feature.

Just wondering whether there is a best-practice way of implementing a map-like parameter. Here it seems that you're thinking of a string containing a list of expressions that you have to custom-parse. Would an SPL map literal make sense? {delayType : "type", ...} , if that's even possible, or {"delayType" : "type", ...} ?

I did not do an exhaustive search of other operators but maybe the compiler team has suggestions.

hildrum commented 9 years ago

What's best practice depends on whether you're talking about a Java primitive operator or a C++ primitive operator. Java operators don't support parameters that are maps or tuples, so we end up having to work around that with strings.

If this were a C++ primitive operator, we could use maps, but I'd probably take a different approach and use custom output functions, so it'd be something like

output O:
   delayType = getField("type");

ulemanstreaming commented 9 years ago

Makes sense. Thanks.

jchailloux commented 8 years ago

I had this issue with a JSON file that contains {type:"xxx" and timestamp:"1234567"}. The workaround I found was to have string _type and _timestamp.

Then having to tweak the java code to check the _type and _timestamp

private Map<String, Object> jsonToAtributeMap(JSONObject jbase, StreamSchema schema) throws Exception {
    Map<String, Object> attrmap = new HashMap<String, Object>();
    for(Attribute attr : schema) {
        String name = attr.getName();
        boolean underscore=false;
           if(name.startsWith("_type")||name.startsWith("_timestamp")){
            underscore=true;
               name=name.substring(1);
           }
        try {
            if(l.isLoggable(TraceLevel.DEBUG)) {
                l.log(TraceLevel.DEBUG, "Checking for: " + name);
            }
            Object childobj = jbase.get(name);
            if(childobj==null) {
                if(l.isLoggable(TraceLevel.DEBUG)) {
                    l.log(TraceLevel.DEBUG, "Not Found: " + name);
                }
                continue;
            }
            Object obj = jsonToAttribute(name, attr.getType(), childobj, null);
            if(obj!=null){
                attrmap.put((underscore?"_"+name:name), obj);
            }
        }catch(Exception e) {
            l.log(TraceLevel.ERROR, "Error converting object: " + name, e);
            throw e;
        }

    }
    return attrmap;
}

markheger commented 7 years ago

Operators should handle this similar to the XML parse operator with the ignorePrefix parameter

schubon commented 7 years ago

Closing after merge of Mark's changes.

IBMStreams / streamsx.json

Add support for protected types in JSONToTuple #12