EcoStruxure / OLGA

an Ontology SDK
MIT License
39 stars 18 forks source link

Unable to generate code for Brick Ontology v1.2.0 #29

Closed amanvirmundra closed 3 years ago

amanvirmundra commented 3 years ago

Describe the bug I am trying to generate POCO/POJO classes for Brick ontology v1.2.0 but the OLGA CLI generate throws an error about Invalid class name for both C3 and Java code gen. The Error happens because BRICK has some classes which contain '.' e.g. PM2.5_Concentration. This interferes with Class naming convention of .Net and Java. Visitor.IsValidToken() returns false and the codegen fails for such classes.

To Reproduce Steps to reproduce the behavior:

  1. Download the latest brick.ttl file(v1.2.0) from https://brickschema.org/schema/1.2.0/Brick.ttl
  2. Installed OLGA (v0.0.6) from https://github.com/EcoStruxure/OLGA
  3. To Generate the code from the file this is the command that I use. java -jar OLGA-Cli/target/OLGA-Cli-0.0.6-with-dependencies.jar --code java --library rdf4j --name BrickSchemaClasses --path Brick.ttl --out .

I see the following error

2021-06-28 20:20:08 ERROR OLGA:90 - semanticstore.ontology.library.generator.exceptions.InvalidClassNameException: PM2.5_Concentration Invalid class name, it doesn't follow the convention. PM2.5_Concentration Invalid class name, it doesn't follow the convention. **

Expected behavior The POCO/POJO classes should be generated for the Brick ontology (since the documentation mentions that it has been tested against Brick (maybe a previous version).

Desktop (please complete the following information):

radissoa commented 3 years ago

Hello Amanvir,

You are right the PM2.5_Concentration class name is conflicting with naming conventions of Java. I have met this issue, and found a quick fix for it if @charbull agrees with it I can try to create a PR with a fix soon. So the idea is to look for the forbidden characters in the class & property names and replace them with an alphanumerical translation.

Example : brick:PM2.5_Concentration will have this class name : PMdot5_Concentration

We can plan for the reserved keywords to do the same and replace int with _int.

The downside are two folds but manageable in my opinion :

navneet79 commented 3 years ago

@amanvirmundra @radissoa , I've also faced this issue and I have the changes ready with me , but my implementation just takes care of the "." character in the class names, I can make it compatible to other characters too, if there is already a list present of such characters I can easily integrate that and raise a PR.

charbull commented 3 years ago

We already outlined this for the special characters here:

https://github.com/EcoStruxure/OLGA/wiki/User-Guide#2-special-characters

@amanvirmundra you can change the name of the class to avoid those special characters.

@radissoa what you are proposing makes sense. Keep in mind to change the dependencies as well

"A connectsTo C.2.5" needs also to be swapped during the code generation to C_2_5.

we need to track all the references and swap them which might be tricky to make during the code generation.

a) Should we swap all the special characters to _ or we introduce the dot?

b) should this be a flag for the user --swap-special-characters and not have it by default?

amanvirmundra commented 3 years ago

Thanks for the quick response @charbull @radissoa. We have manually replaced the '.' with and . Also @navneet79 has updated the code for UTILS.isTokenValid() to include a '.' in KeywordsSet and ZClass.zClassName to replace a '.' with a ''. Our knowledge of Ontologies and OLGA is limited and you are absolutely right that dependencies, relationships and properties reflect the same and that could get complicated. Having said that, I think if OWL spec allows for Class names to have such characters then OLGA codegen should have a mechanism to deal with it. Couple of ways that Navneet and I see to address this:

  1. Have a helper method to clean a class name/interface e.g. UTILS.cleanClassName, and optionally condition it with a flag (--swap-special-characters as suggested by @charbull)
  2. Update FreeMaker templates to handle the same. Upside: User can decide the substitution e.g. '_' vs 'dot'. Downside we have update templates for different supported languages. Not too intuitive to a novice user.
  3. Maybe have a Validate mode that would list out all the incompatibilities.

Happy to discuss further and help in some way.

charbull commented 3 years ago

@amanvirmundra even if owl spec supports it there are still limitations to the code generation as java doesn't. So there are cases where even the substitution will not work. Check those: https://github.com/EcoStruxure/OLGA/wiki/User-Guide#1-reserved-keywords

And keep in mind this is only Java generation. There are other constraints in C# and python as well. This is why we didn't go with the substitution plan and gave the author the option to work around those by changing the names of the class. Which is also the way how the code generation with protégé plugin works.

@amanvirmundra @navneet79 did the substitution of the class worked for you?

@radissoa we would probably need to support this for java, c#, and python templates. Do you have some bandwidth to take this work?

Thank you

amanvirmundra commented 3 years ago

Yes @charbull replacing dots in class names with underscores works and the code is generated for https://brickschema.org/schema/1.2.0/Brick.ttl.

radissoa commented 3 years ago

@charbull I can find some time next week to work on this I will use the strategy from point "1. Have a helper method to clean a class name/interface e.g. UTILS.cleanClassName, and optionally condition it with a flag (--swap-special-characters as suggested by @charbull)" as I have used the same technique to solve this issue. I'll update everyone here as soon as I have something to share

charbull commented 3 years ago

Thank you @radissoa

charbull commented 3 years ago

I will close this issue. @radissoa feel free to create a new bug to add the new feature regarding the swap. Thank you