frictionlessdata / tableschema-java

A Java library for working with Table Schema.
MIT License
25 stars 21 forks source link

Official Table Schema schema does not validate #20

Closed gobertm closed 4 years ago

gobertm commented 6 years ago

Hello, It seems like the official Table Schema provided here (https://raw.githubusercontent.com/frictionlessdata/tableschema-java/master/src/main/resources/schemas/table-schema.json) does not validate with code example from the README. And I guess it should as the description says : "A Table Schema for this resource, compliant with the Table Schema specification."

Schema expectedschema = new Schema(expectedschemaFilepath, true);

Error : org.everit.json.schema.ValidationException: #: required key [fields] not found

Moreover it is not possible to infer its schema :

URL url = new URL("https://raw.githubusercontent.com/frictionlessdata/tableschema-java/master/src/main/resources/schemas/table-schema.json");
Table table = new Table(url);
Schema schema = table.inferSchema();
System.out.println(schema.getJson());

Error : io.frictionlessdata.tableschema.exceptions.TypeInferringException at io.frictionlessdata.tableschema.Table.inferSchema(Table.java:107)

First time I use the specification and the library. Am I misunderstanding something?

Kind regards, Maxime

gobertm commented 6 years ago

More information after today trials :
It seems that the result of inferSchema() method is incompatible with the validate() one.

After trying the code of the example README page :

URL url = new URL("https://raw.githubusercontent.com/frictionlessdata/tableschema-java/master/src/test/resources/fixtures/simple_data.csv"); Table table = new Table(url);

Schema schema = table.inferSchema(); System.out.println(schema.getJson());

// {"fields":[{"name":"id","format":"","description":"","title":"","type":"integer","constraints":{}},{"name":"title","format":"","description":"","title":"","type":"string","constraints":{}}]}

The resulting schema does not validate. It seems that the problem is the constraints object that cannot be an empty object.

iSnow commented 4 years ago

A lot has changed since then, so

 Schema expectedschema = new Schema(expectedschemaFilepath, true);

became Schema expectedschema = Schema.fromJson (new File(getTestDataDirectory() , "schema/employee_schema.json"), true);

And that works now (see SchemaTest#testIssue20()).

The next one is a misconception about what inferring of Schemas means.

URL url = new URL("https://raw.githubusercontent.com/frictionlessdata/tableschema-java/master/src/main/resources/schemas/table-schema.json");
Table table = new Table(url);
Schema schema = table.inferSchema();
System.out.println(schema.getJson());

The URL points to an existing schema-json, inferral would work on a sample CSV and return a guessed Schema.

The next one:

URL url = new URL("https://raw.githubusercontent.com/frictionlessdata/tableschema-java/master/src/test/resources/fixtures/simple_data.csv");
Table table = new Table(url);

Schema schema = table.inferSchema();
System.out.println(schema.getJson());

Is now fixed. I created a test in SchemaTest that passes (note the true in Schema.fromJson() meaning we are using strict validation:

@Test
public void test2Issue20() throws Exception {
    URL url = new URL("https://raw.githubusercontent.com/frictionlessdata/tableschema-java/" +
            "master/src/test/resources/fixtures/data/simple_data.csv");
    Table table = new Table(url);

    Schema schema = table.inferSchema();
    String json = schema.getJson();
    Schema newSchema = Schema.fromJson(json, true);
    Assert.assertTrue(newSchema.isValid());
}