Closed rohitsw closed 7 years ago
FWIW, I did this for the text toolkit.
@hildrum how did you implement it?
It looks like it's not a documented feature. Oops, gotta update that.
I picked a quotePrefix protectreserved_
. (This is clunky, but it was an unlikely case.) If you want to take something from the SystemT to a streams tuple, and the attribute in SystemT has a name that's a reserved word like "rstring", then you use a streams attribute named "protectreservedrstring", and the operator knows to strip off the "protectreserved" when determining how to populate the streams attribute. It only strips off one, I believe, so protectreserved could be used to protect an attribute beginning with protectreserved, though I didn't test that.
ok, i had something similar in the earlier version of the JSON operators. However I dont find that to be an elegant solution. I was thinking that the operator could accept parameters that could be in key=value format. Here key would be the JSON attribute name and the value would be its mapping to the corresponding tuple type. This should give more flexibility in attribute naming.
e.g.
attributeMap : "rstring=myrstring";
Thoughts?
I think that's a good, more general, solution. We have something similar proposed for HBASE: https://github.com/IBMStreams/streamsx.hbase/issues/23#issuecomment-41174439
The attributeMap parameter approach sounds good. A related but slightly different mechanism is used in the Mining toolkit: Not a single parameter whose value is a map, but one operator parameter for each model parameter. I can't fully articulate the pros and cons of the two approaches, but I think the single parameter with a map value (the one proposed here) is cleaner. There's something weird about parameters that are not predefined in the operator model.
Last I checked, this feature is still missing. Is this likely to move forward? Who is maintaining this toolkit now?
I'm the one maintaining the toolkit. But since this issue hadn't been commented on for a year, I didn't think there was anyone who could use this feature. Is this something you need?
Recently I was building a demo using FAA airport data, which included an element called "type". I found myself having to massage the JSON string before converting it:
stream<Records> FixedUpJSONRecords as O = Functor(JSONRecords as I)
{
output O:
record = regexReplace(I.record, '"type":', '"delayType":', false);
}
That works, but has workaround written all over it. I had used an earlier version of the toolkit back in 2013 and was wondering what had happened to the protected-prefix parameter I remembered seeing (but did not need at the time). To be honest, I failed to find this discussion thread.
So "need" is relative, but I've only used JSON parsing twice and needed something like it 50% of the time. (And no, I'm not taking that statistic seriously.) I'd say, it's a very nice-to-have.
I'll try to get the feature in sometime soon, then. I think the way it'd work is that you'd specify
attributeMap: "streamstype=jsontype";
(this is the reverse of what rohitsw proposed above) ie, in your example:
attributeMap: "delayType=type";
+1 for adding this feature.
Just wondering whether there is a best-practice way of implementing a map-like parameter. Here it seems that you're thinking of a string containing a list of expressions that you have to custom-parse. Would an SPL map literal make sense?
{delayType : "type", ...}
, if that's even possible, or {"delayType" : "type", ...}
?
I did not do an exhaustive search of other operators but maybe the compiler team has suggestions.
What's best practice depends on whether you're talking about a Java primitive operator or a C++ primitive operator. Java operators don't support parameters that are maps or tuples, so we end up having to work around that with strings.
If this were a C++ primitive operator, we could use maps, but I'd probably take a different approach and use custom output functions, so it'd be something like
output O:
delayType = getField("type");
Makes sense. Thanks.
I had this issue with a JSON file that contains {type:"xxx" and timestamp:"1234567"}. The workaround I found was to have string _type and _timestamp.
Then having to tweak the java code to check the _type and _timestamp
private Map<String, Object> jsonToAtributeMap(JSONObject jbase, StreamSchema schema) throws Exception {
Map<String, Object> attrmap = new HashMap<String, Object>();
for(Attribute attr : schema) {
String name = attr.getName();
boolean underscore=false;
if(name.startsWith("_type")||name.startsWith("_timestamp")){
underscore=true;
name=name.substring(1);
}
try {
if(l.isLoggable(TraceLevel.DEBUG)) {
l.log(TraceLevel.DEBUG, "Checking for: " + name);
}
Object childobj = jbase.get(name);
if(childobj==null) {
if(l.isLoggable(TraceLevel.DEBUG)) {
l.log(TraceLevel.DEBUG, "Not Found: " + name);
}
continue;
}
Object obj = jsonToAttribute(name, attr.getType(), childobj, null);
if(obj!=null){
attrmap.put((underscore?"_"+name:name), obj);
}
}catch(Exception e) {
l.log(TraceLevel.ERROR, "Error converting object: " + name, e);
throw e;
}
}
return attrmap;
}
Operators should handle this similar to the XML parse operator with the ignorePrefix parameter
Closing after merge of Mark's changes.
For examples this is a valid json { "rstring" : "mystring", "type" : "mytype" }
However there is no tuple structure that will allow for this json to be mapped since rstring and type are SPL protected names that cannot be used as attribute names in tuples. We need a way to have these type of attributes accessible.