Open FullPint opened 5 years ago
The code should automatically learn the schema of nested documents. There was a bug in the sample code that I just fixed, that might have caused the issue. Use vectorizer.extend(docs)
for learning the schema, where docs
is a list of JSON documents, or use vectorizer.extend([doc])
when learning the schema incrementally.
Hello arsarabi,
Thank you for making your code available.
I've also had no luck learning nested attributes. Do I need to define a vectorizer of type "object" to be able to learn nested JSON objects?
Suppose I have a set of documents that match the following schema:
{
"nestedobject": {
"stringattr1": "some string",
"numberattr1": 42,
"stringattr2": "another string"
},
"stringattr3": "a third string",
"booleanattr1": true
}
...do I need to define additional vectorizers beyond those you provide in the sample code?
If I (only) use the vectorizers provided in the sample code, the only learned features are:
0: root has "booleanattr1"
1: root has "stringattr3"
2: root has "nestedobject"
Thank you in advance for answering this (very basic) usage question :).
Hello,
It has been a while since I worked on this but I believe it should work with nested JSON out of the box following the usage steps. Could you provide sample code that recreates the issue? Thanks!
Currently when running, all that is returned is the schema from "root", even though I have over 100,000 documents that have many nested attributes.
Currently in vectorizers there are the following:
Is there something I'm not quite understanding when it comes to "learning" deeper JSON than beyond 'root'?