bazaarvoice / jolt

JSON to JSON transformation library written in Java.
Apache License 2.0
1.54k stars 328 forks source link

"Getting Started" java code breaks #82

Closed Tihamer54 closed 10 years ago

Tihamer54 commented 10 years ago

Jolt is an awesomely great idea, and the java/json community should be grateful that Milo and Sam implemented it.

Unfortunately, the start-up example is difficult to understand (it would have made sense to use one of the prototypical music discograpy or book catalog examples).

More importantly, the java code in "Getting Started" has a bug. Specifically, in the java code at:

Chainr chainr = new Chainr( chainrSpecJSON );

I get:

Exception in thread "main" java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to com.bazaarvoice.jolt.JoltTransform

milosimpson commented 10 years ago

Will look into it.
Bazaarvoice's primary product is Ratings and Reviews, so that is why transforming product Ratings and Reviews is the sample. ;)

That said, please to link to the more prototypical music discography or book catalog examples.

Tihamer54 commented 10 years ago

On Wed, Feb 12, 2014 at 2:01 PM, Milo Simpson notifications@github.comwrote:

Will look into it.

Wonderful!!! Thanks a million!

I did get Jolt to work with the following Java code:

package mycompany.org; import com.bazaarvoice.jolt.Chainr; import com.bazaarvoice.jolt.JsonUtils; import java.io.IOException;

public class MyJoltTutorialExample {

public static void main(String[] args) throws IOException {
    String specFilename = "src\\main\\resources\\chainrSpec1.json";
    String inFilename = "src\\main\\resources\\bookInput.json";
    System.out.println("Transforming input " + inFilename + " using " + specFilename);
    Chainr chainr = ChainrFactory.fromFile(new File(specFilename));
    Object inJSON = JsonUtils.jsonToObject(new FileInputStream(inFilename));
    //System.out.println("inJSON=" + inJSON); //dumps input in one string
    Object transformedOutput = chainr.transform(inJSON);
    System.out.println("Pretty output=\n" + JsonUtils. toPrettyJsonString (transformedOutput));
}

}

I've also got about 8 pages of my tutorial done so far (with screenshots, etc) in MS Word format. What format would you like it in when I finish? PDF?

Bazaarvoice's primary product is Ratings and Reviews, so that is why transforming product Ratings and Reviews is the sample. ;)

Ah, I see. Now at least I understand where those names came from. Unfortunately, they're difficult to contextualize, and that is why the example is difficult to understand.

That said, please to link to the more prototypical music discography or book catalog examples.

Sure. There is a music example at https://www.apple.com/itunes/affiliates/resources/documentation/itunes-store-web-service-search-api.html with a single song:

{"wrapperType":"track", "kind":"song", "artistId":909253, "collectionId":120954021, "trackId":120954025, "artistName":"Jack Johnson", "collectionName":"Sing-a-Longs and Lullabies for the Film Curious George", "trackName":"Upside Down", "collectionCensoredName":"Sing-a-Longs and Lullabies for the Film Curious George", "trackCensoredName":"Upside Down", "artistViewUrl":"https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewArtist?id=909253", "collectionViewUrl":"https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewAlbum?i=120954025&id=120954021&s=143441", "trackViewUrl":"https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewAlbum?i=120954025&id=120954021&s=143441", "previewUrl":"http://a1099.itunes.apple.com/r10/Music/f9/54/43/mzi.gqvqlvcq.aac.p.m4p", "artworkUrl60":"http://a1.itunes.apple.com/r10/Music/3b/6a/33/mzi.qzdqwsel.60x60-50.jpg", "artworkUrl100":"http://a1.itunes.apple.com/r10/Music/3b/6a/33/mzi.qzdqwsel.100x100-75.jpg", "collectionPrice":10.99, "trackPrice":0.99, "collectionExplicitness":"notExplicit", "trackExplicitness":"notExplicit", "discCount":1, "discNumber":1, "trackCount":14, "trackNumber":1, "trackTimeMillis":210743, "country":"USA", "currency":"USD", "primaryGenreName":"Rock"}

There is a simple book example at http://www.tutorialspoint.com/json/json_quick_guide.htm, which I adapted for the tutorial that I'm writing:

{ "comment": "Inspired by http://www.tutorialspoint.com/json/json_quick_guide.htm", "book": [ { "title": "Java The Complete Reference", "author": "Herbert Schildt" }, { "title": "Semantic Web for the Working Ontologist", "author": "Dean Allemang and James Hendler" }, { "title": "Lord of the Rings", "author": "J. R. R. Tolkien" } ] }

It's very easy to change book -> Books, and comment -> Comment:

[ { "operation": "shift", "spec": { "comment": "Comment", "book": "Books" } } ]

It is also easy to insert an extra branch in the tree (while deleting the comment):

[ { "operation": "shift", "spec": { "book": "Media.Books" } } ]

I can also collect the elements of the array:

[ { "operation": "shift", "spec": { "book": { "*": { "title": "Books.Title", "author": "Books.Author(s)" } } } } ]

To get:

{ "Books" : { "Title" : [ "Java The Complete Reference", "Semantic Web for the Working Ontologist", "Lord of the Rings" ], "Author(s)" : [ "Herbert Schildt", "Dean Allemang and James Hendler", "J. R. R. Tolkien" ] } }

But I want:

{ "Books": [ { "Title": "Java The Complete Reference", "Author(s)": "Herbert Schildt" }, { "Title": "Semantic Web for the Working Ontologist", "Author(s)": "Dean Allemang and James Hendler" }, { "Title": "Lord of the Rings", "Author(s)": "J. R. R. Tolkien" } ] }

I'm having a heck of a time converting (the original) title to Title, and author to Author. There is something I'm just not understanding about how the LHS and RHS operate.

Eventually, I'd like to show an example of JSON to JSON-LD.

Thanks for any help you can give.

milosimpson commented 10 years ago

You can play with the transforms here, w/out having to download the code. http://jolt-demo.appspot.com/ I need to update it and link it in, but it works. It is backed by a free Google App Engine, so it may take like a minute for the backend to wake up and process.

WRT you book example, and how you want an array of Books. This spec does it.

[
    {
        "operation": "shift",
        "spec": {
            "book": {
                "*": {
                    "title": "Books[&1].Title",
                    "author": "Books[&1].Author"
                }
            }
        }
    }
]

What is "JSON to JSON-LD" ?

Lastly, cool tutorial. Where do you plan to post it?

Tihamer54 commented 10 years ago

On Wed, Feb 12, 2014 at 9:31 PM, Milo Simpson notifications@github.comwrote:

You can play with the transforms here, w/out having to download the code. http://jolt-demo.appspot.com/ I need to update it and link it in, but it works. It is backed by a free Google App Engine, so it may take like a minute for the backend to wake up and process.

That is great to know, but being the engineer that i am, I like to have everything close by so that I can pick it apart (e.g. see the the source code as it runs).

WRT you book example, and how you want an array of Books. This spec does it.

[ { "operation": "shift", "spec": {

  • "book": { "": {
  • "title":* "Books[&1].Title", "author": "Books[&1].Author" } } } } ]

It worked great! Thanks!

So the LHS (Left Hand Side) in blue above matches and copies the green in the input below, though I'm not sure what exactly specifies what happens to the value. It appears that it just gets copied, but what if you wanted to change it to something else?

{

"comment": "Inspired by

http://www.tutorialspoint.com/json/json_quick_guide.htm",

*"Books": *

}

The RHS (Right Hand Side) in red above is what gets created in the output below:

{ "Books" : [ { "Title" : "Java The Complete Reference", "Author" : "Herbert Schildt" }, { "Title" : "Semantic Web for the Working Ontologist", "Author" : "Dean Allemang and James Hendler" }, { "Title" : "Lord of the Rings", "Author" : "J. R. R. Tolkien" } ] }

Now I'm trying to use the default mode to insert things; in this case the editions and their values. How do I do it?

{ "book" : [ { "title" : "Java The Complete Reference", "edition" : "Third", "author" : "Herbert Schildt" }, { "title" : "Semantic Web for the Working Ontologist", "edition" : "Second", "author" : "Dean Allemang and James Hendler" }, { "title" : "Lord of the Rings", "edition" : "Three hundred and twentieth", "author" : "J. R. R. Tolkien" } ] }

What is "JSON to JSON-LD" ?

JSON-LD (for Linked Data) is a JSON way to represent Semantic Web information (specifically RDF (Resource Description Framework)).

JSON-LD has a "context" to provide mappings from JSON to an RDF-like model. This "context" connects properties in a JSON document to concepts in an ontology. In order to provide some basic interoperability with RDF, JSON-LD also allows values to be coerced to a specified type or to be tagged with a (natural) language. A context can be embedded directly in a JSON-LD document or put into a separate file and referenced from different documents (from traditional JSON documents via an HTTP Link header). For example, the JSON-LD text below is recognizably legal JSON, but it has some higher-level constructs that capture more meaning than pure syntax can provide:

{ "@context": { "name": "http://xmlns.com/foaf/0.1/name", "homepage": { "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage", "@type": "@id" }, "Person": "http://xmlns.com/foaf/0.1/Person" }, "@id": "http://me.markus-lanthaler.com", "@type": "Person", "name": "Markus Lanthaler", "homepage": "http://www.tugraz.at/" }

In triplets that almost all be dereferenced, the above JSON-LD represents:

Subject

Predicate

Object

http://me.markus-lanthaler.com

http://www.w3.org/TR/rdfa-core/#A-typeof

http://xmlns.com/foaf/0.1/Person

http://me.markus-lanthaler.com

http://xmlns.com/foaf/0.1/name

"Markus Lanthaler"

http://me.markus-lanthaler.com

http://xmlns.com/foaf/0.1/workplaceHomepage

http://www.tugraz.at/

Because the values can be IRIs (International Resource Identifiers, which are each unique on this planet), JSON-LD files that represent ontologies are grounded (the intention is that they can be dereferenced). This means that not only can humans and machines communicate unambiguously, but that different ontologies can be merged (at least partially; e.g. there are important caveats with the owl:sameAs property; see http://www.w3.org/2009/12/rdf-ws/papers/ws21). Also, because the triplets in RDF are (in some ways) equivalent to First Order Predicate Logic, we can do provably correct reasoning across ontologies (though I should add that DL (Descriptive Logic) gets really complicated really fast, and we probably won't figure out how to do modal logic correctly for another 20 years).

Lastly, cool tutorial. Where do you plan to post it?

Wherever you want me to--when it's done (somewhere on https://github.com/bazaarvoice/jolt/, I would imagine). It's the least I can do since you contributed some amazingly cool and powerful software. I just need to get permission from my employers, since they paid for it.

Reply to this email directly or view it on GitHubhttps://github.com/bazaarvoice/jolt/issues/82#issuecomment-34942587 .

milosimpson commented 10 years ago

Now I'm trying to use the default mode to insert things; in this case the editions and their values. How do I do it?

The problem is the "edition" you are trying to add is not actually a default across all books. You wanted "Java The Complete Reference" to be "Third" edition, while "Lord of the Rings" to be ""Three hundred and twentieth".

If you need to "mixin" other info, you need to do that outside of Jolt, or implement it as a Custom Java "ContextualTransform" where you pass the the ContextualTransfrom the "extra" context it needs.

It appears that it just gets copied, but what if you wanted to change it to something else?

Correct, Jolt is about "transforming the structure of your data" not so much about modifying the contents of the Data.

The idea is use Jolt to get your data structured correctly, then either outside of Jolt, or as a Custom Java Transform modify your data.

RE : JSon-LD

Yeah I had to play with RDF back in the day. Hurt my head.

Tutorial location

Two thoughts.
1) totally cool with adding a "sample" maven module to contain sample code. Peoples could clone the project and mess with things there. 2) Tutorial can be MD file, or link to a blog, or a Slide deck. Did you see the slide deck linked from the top level readme?

Tihamer54 commented 10 years ago

On Thu, Feb 13, 2014 at 3:22 PM, Milo Simpson notifications@github.comwrote:

Now I'm trying to use the default mode to insert things; in this case the editions and their values. How do I do it?

The problem is the "edition" you are trying to add is not actually a default across all books. You wanted "Java The Complete Reference" to be "Third" edition, while "Lord of the Rings" to be ""Three hundred and twentieth".

If you need to "mixin" other info, you need to do that outside of Jolt, or implement it as a Custom Java "ContextualTransform" where you pass the the ContextualTransfrom the "extra" context it needs.

I suspected as much. Well, let me split up my question.

  1. How do I insert the same key/value pair (let's say that they are all first editions)? I'm sure that Default mode is good for that, but I'm not sure how to do it, especially when dealing with arrays.
  2. How does the Java mode work? Unlike the other modes, there is no JavaDoc on it. What I have so far is: public class JavaTutorialTransform implements ContextualTransform {

public Object transform(Object input, Map<String, Object> context) { // TODO Auto-generated method stub return null; } }

I assume that the spec.json file is simply:

[ { "operation": "java", "spec": { "book": "JavaTutorialTransform" } } ]

RE : JSon-LD

Yeah I had to play with RDF back in the day. Hurt my head.

:-) Funny, I love RDF, especially after seeing what SPARQL can do with DbPedia. Not that looking at DLs won't cause serious brain damage.

For me, trying to figure out the details of the search/replace paradigm for Jolt has hurt my head. I'm sure it simple, once you grasp it, but I think I'm missing some magic step. Which is why I'm writing this Dummy's tutorial.

1) totally cool with adding a "sample" maven module to contain sample code. Peoples could clone the project and mess with things there. 2) Tutorial can be MD file, or link to a blog, or a Slide deck. Did you see the slide deck linked from the top level readme?

Yes, I saw the Slide Deck, and was very optimistic about reading it, until I saw that I have no idea what happens during the clicks in the slide notes. Sometimes nothing beats the original PowerPoint.

Speaking of MD files, your README.md says "The JSON spec for Chainr looks like : unit testhttps://github.com/bazaarvoice/jolt/blob/master/jolt-core/src/test/resources/json/chainr/firstSample.json." and the unit test points to a 404.

milosimpson commented 10 years ago
  1. How do I insert the same key/value pair (let's say that they are all first editions)? I'm sure that Default mode is good for that, but I'm not sure how to do it, especially when dealing with arrays.

See Defaultr unit test https://github.com/bazaarvoice/jolt/blob/master/jolt-core/src/test/resources/json/defaultr/photosArray.json Also see https://github.com/bazaarvoice/jolt/blob/master/jolt-core/src/main/java/com/bazaarvoice/jolt/Defaultr.java Javadoc that explains Arrays and the "Algorithm".

Shift and Default have different DSLs. In Defaultr you have to tell it you are dealing with an array, like "photos[]".

2) How does the Java mode work? Unlike the other modes, there is no JavaDoc on it. Java mode just hands you the in-memory Java Object version of the JSON.

Start with just "implements Transform". There is one method "Object transform( Object input ); ", and in that method you can do whatever you want, but you have to "navigate" your input object from the top, doing your own instanceof, Map.get calls etc.

Use it via like :

 [ 
      { 
            "operation": "com.your.java.class.that.implements.Transform"
      }
 ]

the "operation" : "java" thing is old need to clean that up.

Slide Deck

Just covers the Shift DSL "domain specific language", and there is a speaker notes section that has additional notes and often around the clicks.

Tihamer54 commented 10 years ago

OK, I got everything working (including the Java transform), and I finished the tutorial. Still waiting for permission from upper management to upload; I have high hopes through the rumor mill that they might approve it in a month or two...

That being said, I'm a fumble-fingered fugitive from typing class who couldn't data-enter a JSON file correctly if my life depended on it. So I really appreciate the fix you made in 0.13 that identifies row and column for JSON format mistakes. Unfortunately, even if I get the JSON format correct, I can still make Jolt DSL mistakes and then have no clue on where I did something wrong. For example, if I have a JSON-legal leaf that reads:

 {  "value": "&$3.DollarsEarned" }

Then Jolt gives me the error:

 DotNotation (write key) can not contain '@', '*', or '$'.

This is nice, but it does not give me row and column. I know, the location information is probably lost when the json file was sucked into a Chainr object, but there are ways to preserve it (as attributes called joltRow and joltColumn if nothing else; I have modified XML parsers to do the same thing). At any rate, without too much work, I changed line 46 of ShiftrWriter.java to read:

 throw new SpecException("DotNotation (write key) can not contain '@', '*', or '$' at " + dotNotation + ".");

Now, the error output gives me some context:

 DotNotation (write key) can not contain '@', '*', or '$' at root.&$3.DollarsEarned.

Giving context and location in every SpecException would be nice. The problem is that there are lots of them (which is a great thing, BTW). Should I generate a separate issue for this request? I'd change them all myself, except:

  1. I might break something,
  2. I'm not so familiar with the code that I know exactly how to get location.
  3. Some people might not like to see more error information (IMHO, they are silly), and
  4. Upper management might not approve of my giving away source code.

So all I can do is ask. Pretty please.

milosimpson commented 10 years ago

Yeah, create an issue about log messages.