Adapt Editor to use KaiZen OpenApi Parser for parsing, validation, and content-assist support

andylowry commented 6 years ago

Expected benefits include:

Improved validation messages
Validation of requirements that cannot be captured in JSON Schema
Improved and more comprehensible content assist

Currently, KZOP does not understand Swagger-OpenAPI 2.0, so schema-based approach will need to be retained until that can be added to KZOP. It may make sense, even then, to leave the existing implementation as a configurable option.

andylowry commented 6 years ago

Slack conversation with some relevant discussion:

Andy Lowry [6:21 PM] Let me try to summarize... the refactored source isn't yet published The parser creates an interlinked sturcture of JsonOverlay classes. Besides JO classes for scalar types (like StringOverlay) there are three classes for handling the varous multi-valued JSON types:

Ted Epstein [6:20 PM] OK.

Andy Lowry [6:21 PM]

ListOverlay for JSON arrays
MapOverlay for JSON objects that from which an open-ended set of property keys and values should be extracted
PropertiesOverlay for JSON objects form which a fixed set of property keys and values should be extracted

Ted Epstein [6:22 PM] OK.

Andy Lowry [6:22 PM] The generated classes are all extensions of PropertiesOverlay. (not counting enumerations)

Ted Epstein [6:22 PM] Right.

Andy Lowry [6:23 PM] You can also arrange for a single JSON object to give rise to a PropertiesOverlay as well as one or more MapOverlay objects - it's how extensions are pulled from objects that also correspond to a fixed-property type. Maps can be (and are, in cases like this) configured with a regex for property names that should be included.

Ted Epstein [6:24 PM] OK.

Andy Lowry [6:25 PM] Each PropertiesOverlay value has a collection of fields defined for it. Each has a "name" which is used in method names for generated methods. It also has a "parent path" which is the JSON Pointer path that can be used to navigate to the child from its parent JO object (not necessarily its parent in the JSON object).

Ted Epstein [6:26 PM] OK. and I see there's a _findInternal method that looks close to what I was describing. But it doesn't (yet, in the current implementation) deal with the case of an elided container.

Andy Lowry [6:28 PM] The Overlay class (not JsonOverlay) is an adapter that provides generic utility methods that can be applied to any JO object. E.g. Overlay.of(model) creates an Overlay object for the top-level model object (assuming that's what model holds). Overlay.of(model.getSchemas()) gets an overlay object for the MapOverlay representing the schemas map. Overlay.of(model.getSchemas(), "Pet") gets an overlay for the Pet schema. There are a variety of Overlay.of() method signatures to get overlays for the various JOs that can appear in the overall parse result. No, not really The Overlay methods are not part of the generated classes (they used to be), for two reasons: (a) this way they don't pollute the generated APIs; and (b) the adapter makes it possible to invoke these methods on any JO object in the parsed structure. Previously, that was only possible for PropertiesOvelay objects.

Ted Epstein [6:32 PM] OK, I think I get it, though I could imagine making these methods part of the AbstractJsonObjectOverlay superclass. I guess you didn't do it that way for the reason you described: you didn't want these reflective methods to pollute the generated APIs. So, what I wrote above referred to "overlay objects" and didn't really distinguish between the generated classes (subtypes of ListOverlay, MapOverlay and PropertiesOverlay) vs. the Overlay object.

Andy Lowry [6:36 PM] Among the methods available in Overlay, we have:

find which locates the JO object corresponding to a JSON Pointer from a given parent. This is ambiguous because of the possibility of colocated map and properties objects.
getPropertyNames - only meaningful for PropertiesOverlay objects - returns the "names" not the "parent paths" for the properties configured for the type
isPresent() - says whether the given JO actually represents a present value or is just a placeholder. Placeholders are created rather than leaving behind a bunch of nulls.
getPathInParent says what the JSON path is from this JO's parent to the child.
and a few other things. There's no longer an AbstractJsonOverlay class - just JsonOverlay. It was a useless complication. I didn't want those methods polluting the API visible in an IDE, not just the API visible in the generated class sources.

Ted Epstein [6:38 PM] OK, I see. I could rewrite the questions if that would help. But really, I'm just trying to get to a specific agreement to specify how KZOP/JSON Overlay will address KZOE's requirements for content assist.

Andy Lowry [6:40 PM] The getPropertyNames method will clearly be useful for content assist, but it doesn't currently give you a way to know what JSON structure needs to be added to the object to create a given property. My comment to @ghillairet on this is that the internal data supporting getPropertyNames actually does, now include the parent paths. This is part of the refactoring, and it's due to a revamped representation of PropertiesOverlay objects that made it possible to preserve fixed property order when serializating to JSON. (That was already done for maps, but I still don't preserve order or intermingled field and map entries in a single JSON object). So that's just a fortuitous change already in the refactoring. Adding a means of getting parent paths, rather that just property names, should be quite easy

Ted Epstein [6:41 PM] Right, and that almost covers it.

Andy Lowry [6:41 PM] That doesn't cover everything, including maps and lists, as well as elided JSON properties, as you mentioned. But those, I think, should be reasonably easy to work out And of course there's #$&#$*W additionalProperties - no way to get around bespoke code for that In the class generated for Schema there are two separate properties, named AdditonalProperties and AdditionalPropertiesSchema - for boolean and Schema values, respectively. They both have same parent path, and there's custom logic added to the Schema class to make the two properties mutually exclusive (set one, the other is cleared automatically)

Ted Epstein [6:44 PM] OK. So, to close the loop on this, I'm just looking for a concrete proposal for the new methods that would need to be added to the JsonOverlay class hierarchy and/or the Overlay adapter, to cover the content assist use case. brb...

Andy Lowry [6:49 PM] I'm imagining a getPropertyInfo method in Overlay object to replace getPropertyNames. Will provide a POJO for each field, with multiple pieces of information - field name, parent path, type info, presence, whatever else. What Guiallaume will need for maps and lists I'm not clear on, but it may well be that what's currently available is good enough. The issue with elided properties may best be handled by just making KZOE aware of them. There are very few of them (maybe just components - I'd have to look). The content assist would understand when it's looking at an elided property and would use the parent (or some other ancestor) node for recommendations when that's the case. Maybe that's enough, will need Guillaume's input

tedepstein commented 6 years ago

@andylowry , your last comment made me think there may be some misunderstanding about the problem with elided properties. Let's say the editor is opened with the insertion point here:

The user invokes content assist, and KZOE has to populate the suggestion list with property names that are not already present. In this (obviously contrived) case, components is present, so it should be omitted from the suggestion list.

But IIUC, the presence of components won't be evident by iterating over the PropertyInfo objects that you proposed. Those objects are only available for properties of the top-level OpenAPI object. That object doesn't have a Components property in its list. It elides components, and instead has properties for Schemas, Parameters, Callbacks, etc., corresponding to components/schemas, components/parameters, components/callbacks, etc.

Maybe this problem isn't big enough to worry about. Worst case is that components is offered in the suggestion list even though it's already present, the user inserts it, and it becomes a validation error in the editor. (...assuming duplicate properties are still going to be detected and marked as validation errors? Or would we fail to even detect this problem because KZOP can't register the presence of components without one of its known properties?)

I just wanted to make sure we had a shared understanding of the basic problem.

andylowry commented 6 years ago

@tedepstein My point is that, in iterating over the top-level properties, you're going to see all their parent paths. You'll find some that are simple, like Info with a parent path of info, leading to an Info object. You'll also see some that are less simple, like the Schemas field with a parent path of components/schemas leading to a map overlay holding Path objects. You'll also find another field, PathsExtensions, with exactly the same path (components/schemas) leading to a different map overlay holding the extension properties appearing alongside the paths map (different from the extensions for each individual path). The two map overlays will have mutually exclusive parent paths that will allow you to sort out which values go where.

You'll also, at top level, find a field named Extensions with an empty parent path (""), corresponding to a map overlay that contains the top-level extension properties..

It's not the easiest thing in the world, but let's assume you can figure out where the nearest point in your editor is that's an ancestor of your current editing point and corresponds to a JO object. You can then look at that JO object's PropertyInfo data and figure out which fields apply to the current editing point by comparing the path to that editing point to the fields' parent paths. If your editing point itself corresponds to a JO object, all fields will apply. If you've created a path that's not allowed, none will apply.. Whatever you've got, you'd omit prefixes (e.g. components/) corresponding to your current editing point, from the surviving recommendations.

It's not the most straightforward problem, but it seems reasonably tractable.

tedepstein commented 6 years ago

@andylowry , OK, but in the example I gave, would the components property, whose value is an empty object, have any representation in any of the available PropertyInfo objects? Is there any way for KZOE to see that the components property is present?

andylowry commented 6 years ago

KZOE is the editor. It can see what's in the JSON. If it sees a components JSON property and knows (my assumption) that the nearest JO ancestor is the top-level JO object, of type OpenApi3, which extends PropertiesObject. It will ask for its field data and will find a slew of fields that have components/ as a prefix of their parent paths, like Schemas and SchemasExtensions. Both of them have schemas as the remainder of their parent path, so schemas will be offered as a recommendation.

andylowry commented 6 years ago

@tedepstein Actually, in re-reading my prior comment, I see I wasn't thinking of an algorithmic solution to this, but rather something where KZOE would know, by special-case code, how to deal with components. That may still make sense if the general solution is tricky and there are a small number of these (one?). But it seems that the general approach, if it works, is probably reasonably OK. And maybe with all this we could get closer to your goal of having a general JSON Overlay based editor, rather that one that's specifically designed for OAS3 and Swagger.

andylowry commented 6 years ago

@tedepstein References will be another potentially tricky area. There's lots of support for getting information about references from KZOP, but I imagine KZOE will mostly be looking at the YAML to detect references. The other side is offering references as recommendations. There's no way, currently, to identify where references are allowed within a model. That would need to be added to the types file. Currently, the parser just accepts them anywhere.

tedepstein commented 6 years ago

It seems like the JO object graph replaces the current "model" that Guillaume has built to support context-sensitive code assist and validation. But some algorithms need information that isn't available directly through the current or proposed JO API. KZOE needs to go directly to the YAML (SnakeYAML) or JSON (Jackson) layer in these cases.

I don't know to what extent this is currently problematic, or suboptimal in terms of code simplicity, memory and processing efficiency. And I don't know whether this gets better or worse with the introduction of KZOP/JO. @ghillairet , maybe you can offer some analysis here.

Are there any cases where it would be helpful for Overlay to provide direct access to the lower-level YAML or JSON DOMs?

andylowry commented 6 years ago

@tedepstein The problem with providing direct access to the original JsonNode is that the API provides mutating methods. It may be worth adding a capability to freeze the parse result - maybe even make that the default and require that it be unfrozen before any changes can be made.

Internally, I keep a registry that links every JsonNode to its JO object. The registry is used to ensure that multiple references that resolve to the same JsonNode end up being represented by the same JO object. Exposing that registry in some fashion might be helpful, but only if it's guaranteed to continue to be accurate, and that's not the case with mutation API. (The registry is currently used only during parse, so no problem there).

ghillairet commented 6 years ago

@tedepstein @andylowry I started experimenting with KZOP in branch https://github.com/RepreZen/KaiZen-OpenAPI-Editor/tree/task/kzop-parser-wip

You can see a simplistic content assist being done in https://github.com/RepreZen/KaiZen-OpenAPI-Editor/blob/task/kzop-parser-wip/com.reprezen.swagedit.core/src/com/reprezen/swagedit/core/assist/OpenApiProposalProvider.java The call to the parser is done in JsonDocument and the parser is here https://github.com/RepreZen/KaiZen-OpenAPI-Editor/blob/task/kzop-parser-wip/com.reprezen.swagedit.core/src/com/reprezen/swagedit/core/json/OpenApi3LocationParser.java It extends the default parser to keep track of nodes locations.

It shows that content assist could be doable that way, but I will need to wait for new version of KZOP for that. I'm more concerned about validation. It looks like using the JSON schema validation library will still be required. For example it seems that if some properties are present in the spec and do not belong to the schema, then KZOP will ignore them. If I write path instead of paths, then no errors are being detected by KZOP.

andylowry commented 6 years ago

@ghillairet Re the validation issues, that's a known deficiency that will be addressed. In general, there's a need to go carefully through the validation logic to make sure it reflects every requirement appearing in the specification. I think the "extra properties" check is the primary (and perhaps only) general hole in the current validation; mostly the rest will be mistakes in the current validation logic, or specific requirements that are not checked.

ghillairet commented 6 years ago

Also related, I found this bug in KZOP https://github.com/RepreZen/KaiZen-OpenApi-Parser/issues/168 that prevents finding some elements during content assist.

andylowry commented 6 years ago

@ghillairet Right, thanks for reporting that. I'll check if it's still a problem after the refactoring is complete, and fix it if so.

RepreZen / KaiZen-OpenAPI-Editor

Adapt Editor to use KaiZen OpenApi Parser for parsing, validation, and content-assist support #451