FasterXML / jackson-dataformats-text

Uber-project for (some) standard Jackson textual format backends: csv, properties, yaml (xml to be added in future)
Apache License 2.0
402 stars 145 forks source link

YAML Anchors / References #98

Open marcoslopes opened 6 years ago

marcoslopes commented 6 years ago

Hi there, first of all thanks heaps for your awesome work with Jackson.

I am trying to get a concise configuration out of a yaml file, but I am pretty stuck at the moment.

I have followed many issues and still did not managed to get it working using only @JsonIdentityInfo By the this tests it should be as simple as annotating Environment class with @JsonIdentityInfo(generator = ObjectIdGenerators.StringIdGenerator.class) but that also did not work.

After reading this issue i thought it could be the scope, so I have annotated the environment and bundle1 properties adding the main Configuration class as scope but that also yield not success. Is this possible today, if it is how can it be achieved, and if not what do you recommend?

As the issue 266 suggests, I should write my own DeserializationContext, and figure out a way to find the references. But "you" --I believe it is you-- recommended this before.

Which of these two should I follow if there is no simple way to get it working with the current 2.9.6 version?

Trying to debug I did not see any calls to bindItem which means that SimpleObjectIdResolver._items are always null, even though I can see the references being set what would trigger that line to be called? Please keep in mind that I don't claim to know anything about jackson, that was what I could gather from trying to debug it.

name: "dropwizard-app"

environment: &environment 
    profile: "dev"
    provider: "aws"
    segment: "segment"
    security-context: "test"
    region: "us-east-1"

bundle-1: &bundle1
    environment: *environment
    path: "x/y/z"

bundle-2:
    # property-config-1: (optional)
    endpoint: "https://my-service-1"
    bundle-1-configuration: *bundle1

PS. Like the tests I have also tried the anchoring as a prefix, but it seems that the library have no issues finding both ways.

Thanks again.

mm-matthias commented 5 years ago

Seems like anchor/reference bugs affect quite a few people:

https://github.com/OpenAPITools/openapi-generator/issues/1593 https://stackoverflow.com/questions/46689801/yaml-jackson-anchor-keys-of-array https://github.com/RepreZen/KaiZen-OpenAPI-Editor/issues/183

nkiesel commented 5 years ago

Same here. I just switched my code to snakeyaml because that handles anchors/refs correctly. Which is strange because jackson-dataformats-yaml is using snakeyaml internally

cowtowncoder commented 5 years ago

@nkiesel Limitation is a combination of multiple things, but basically SnakeYAMLs incremental part only exposes information about anchors, references, but does no resolution as far as I know. This means that Jackson needs to do that part. But since databinding is format-agnostic, it can only operate in a way general data model works, and for that Object Id handling is opt-in on per-type (or per-property sometimes) basis. This because JSON does not have concept natively and support for identity has been added by requiring annotations (partly since there was/is no real standard for specifying concept; and partly since there is non-trivial overhead for having to keep track of every value in whole input for possible back reference).

One thing that might make sense is to see if higher level SnakeYAML API might allow pre-processing references. I'll create issue for that.

cowtowncoder commented 5 years ago

Created 2 new issues to possibly address this:

odupuy commented 4 years ago

I have a similar issue. I parse my swagger file with this library. In the tree of jsonNode, I have endpoint referring to other endpoint for codes, body... as they are deprecated replaced by a better path. Following the DRY principle, the things are defined once. I can see a reference to an anchor as a Json node <<: *some_anchor but if I open the corresponding endpoint, I don't see the opposite &some_anchor in the tree. Even it was listed, even if it was a manual process I could do something but here I am stuck.

I will have to use snakeyaml as an alternative.

+1 to fix this, view the exposed anchor at the very least, navigate to the referred anchored or inline it if possible.

cowtowncoder commented 4 years ago

@odupuy One specific problem is that while streaming api (JsonParser) does expose necessary ids (anchor and reference) -- and even buffering (TokenBuffer) at databind level, I think -- there is no place in JsonNode hierarchy to do the same. So part of the thing is that I am not sure if it is practical to extend things at that level. At same time I realize that Tree Model is very convenient model in many other ways, so it would be useful to do that: and since one can "read" and "write" to/from Tree model (to expose it as streaming JsonParser / JsonGenerator), that would allow full end-to-end handling.

But at the same time, adding overhead of every node potentially having anchor is problematic, so perhaps there should be separate TreeNode hierarchy for YAML backend to use -- esp. since XML backend would benefit from some extension/differences (f.ex to allow "dups" for ObjectNode, to represent sequence of same element names). Problem, however, is that representing parallel type hierarchies in Java gets tricky pretty fast. Alternative of sub-classing JsonNode implementations is even more difficult.

odupuy commented 4 years ago

Thanks for your answer @cowtowncoder. I cannot comment much on what it is effectively doable in this lib but I will keep an eye on the issue. In the meantime SnakeYml allows me seeing both anchor and reference (they are inlined) so I will switch to this one.

cowtowncoder commented 4 years ago

@odupuy Makes sense; you need to use something that does what you need.

But one thing I realized last night, which might allow a reasonable solution is this: instead of trying to solve this via sub-classing (inheritance), with its known problems, how about composition? So: addition of a new type of node -- "reference node"? "metadata node"? "info wrapper"? -- that would effectively contain additional metadata, and reference to "real" value node. Metadata being tracked could be combination of general info (Location, if required; Object and/or Type Ids) and possibly additional Opaque one (backend-dependant), with key/value. The main challenge would just be that of how to make things work as cleanly as possible with existing use, so open questions would include whether new node type should delegate most/all accessor calls to wrapped node (to effectively "masquerade" as whatever value node it contain) or not.

This would be sizable addition so I can't promise much about implementing it soon, but I am excited that it feels like it could be the way forward. This is a long-standing issue affecting multiple backends (for XML, distinction between Elements and Attributes matters, f.ex) so some new approach would be very welcome.

odupuy commented 4 years ago

I was thinking about something similar while replying. A new type of node should be OK and without risks. Simply a text reference to the referred anchor or for an anchor to the referrer(s) would help. Additional direct navigation could be handy as long as there is no side effect making you recursively exploring the tree (not sure if toString() would do that)

cowtowncoder commented 4 years ago

One possibility on resolving references could be a separate entity or helper that will traverse tree, modify it to resolve anchor/refs -- it is something that can not be done easily during reading due to possible forward references (... assuming YAML allows those?). Jackson does handle it incrementally with @JsonIdentityInfo, but that code is quite complicated and brittle: much easier to do with second pass.

And doing it as separate pass would allow format-specific handling, too, in case semantics vary across formats (which they do) -- in theory, could even support XML specific identity that way.

nkiesel commented 4 years ago

YAML only allows backward references: spec says "It is an error for an alias node to use an anchor that does not previously occur in the document." ("alias" node here is the usage of a reference, and "anchor" is it's definition). However, it is allowed to redefine anchors. Thus as I see it, a single pass could resolve all the references.