common-workflow-language / schema_salad

Semantic Annotations for Linked Avro Data
https://www.commonwl.org/v1.2/SchemaSalad.html
Apache License 2.0
72 stars 62 forks source link

Unable to compile schema using avro-tools #7

Open denis-yuen opened 8 years ago

denis-yuen commented 8 years ago

Hi,

This may be a followup to https://github.com/common-workflow-language/common-workflow-language/issues/69 unless I'm mixing something up.

I'm using the current version of common-workflow-language (although the two tags don't seem to fare much better).

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

nothing to commit, working directory clean
$ git log | head
commit 791f4ef1dea83aed7827b679e179c634db7c0095
Author: Peter Amstutz <peter.amstutz@curoverse.com>
Date:   Mon Nov 30 14:23:50 2015 -0500

    No longer assume "#main" as entry point if there is more than one valid entry
    point in a file.

When I use the schema_salad project to generate an avro schema, that succeeds.

$ python -mschema_salad --print-avro ~/common-workflow-language/draft-3/cwl-avro.yml > cwl.avsc
/home/dyuen/schema_salad/schema_salad/__main__.py 1.0.6

However, when I attempt to compile the schema, the following happens:

$ java -jar avro-tools-1.7.7.jar compile schema cwl.avsc cwl
Input files to compile:
  cwl.avsc
Exception in thread "main" org.apache.avro.SchemaParseException: Illegal character in: draft-3.dev1
        at org.apache.avro.Schema.validateName(Schema.java:1083)
        at org.apache.avro.Schema.access$200(Schema.java:79)
        at org.apache.avro.Schema$EnumSchema.<init>(Schema.java:684)
        at org.apache.avro.Schema.parse(Schema.java:1234)
        at org.apache.avro.Schema.parse(Schema.java:1272)
        at org.apache.avro.Schema$Parser.parse(Schema.java:965)
        at org.apache.avro.Schema$Parser.parse(Schema.java:932)
        at org.apache.avro.tool.SpecificCompilerTool.run(SpecificCompilerTool.java:73)
        at org.apache.avro.tool.Main.run(Main.java:84)
        at org.apache.avro.tool.Main.main(Main.java:73)
denis-yuen commented 8 years ago

FYI, I was able to compile with this following trivial change to the cwl-avro.yml.

$ git diff
diff --git a/draft-3/cwl-avro.yml b/draft-3/cwl-avro.yml
index b3ae433..c6310e1 100644
--- a/draft-3/cwl-avro.yml
+++ b/draft-3/cwl-avro.yml
@@ -1117,8 +1117,8 @@
   name: CWLVersions
   doc: "Version symbols for published CWL document versions."
   symbols:
-    - draft-3.dev1
-    - draft-3.dev2
+    - draft3dev1
+    - draft3dev2

 - type: record
   name: CommandLineBinding
mr-c commented 8 years ago

Hello @denis-yuen,

I think this is fixed with https://github.com/common-workflow-language/cwlavro, yes? Thanks again for all your assistance; I appreciate it!

denis-yuen commented 8 years ago

Hi, Not exactly. We use the workaround noted above which works for generating cwlavro. However, I think there is still value in schema-salad generating a valid avro schema. Not only would that remove the above workaround, more importantly, that should make things simpler if an implementer in a different language attempts the same process of generating an sdk.

i.e. schema salad when converting the CWL spec to an avsc should strip out characters that are valid in CWL but not valid in avsc

tetron commented 8 years ago

@denis-yuen I assume this is still a problem. The generated avsc could easily be sanitized, however without being a bit more clever I expect the result will just be that it won't recognize the symbols as defined in the original schema. Perhaps the workaround is to convert fields that use enum types with invalid chars into string fields?

denis-yuen commented 8 years ago

@tetron that sounds reasonable as well