FasterXML / jackson-dataformats-text

Uber-project for (some) standard Jackson textual format backends: csv, properties, yaml (xml to be added in future)
Apache License 2.0
405 stars 149 forks source link

(yaml) Emit comments #36

Open lovasko opened 7 years ago

lovasko commented 7 years ago

Is it please possible to emit comment lines with the keys? I could imagine this being done with an annotation for the respective class member. Also, adding whitespace, e.g. empty line would be a nice feature as well.

If this is not currently possible, could anyone please point me to the code where this magic would be happening so that I can add this feature?

Many thanks! :)

cowtowncoder commented 7 years ago

There is no way to emit comments, in general, by Jackson, so it would be necessary to first figure out why this is something to add (use case, and where contents of comments should come from). After that it would be necessary to figure out how piping of comment contents should occur (what should added, based on metadata).

So... what is the use case? Why comments? Containing what?

IhToN commented 7 years ago

Using a YAML file as a user-setup file in which an user will be able to change some settings in an easy way, using a notepad.

If you want to clarify some of the options you use comments so the user will be able to understand what he should modify and what will cause any of the modifications.

cowtowncoder commented 7 years ago

I guess I am more used to people adding comments manually, and more trying to understand how a machine-processable generation would add comments. I am familiar with comment usage in XML, where generators may add sort of prefix comment containing diagnostics metadata (generation timestamp, system etc).

I think it would be possible to add something like writeComment in base class (JsonParser) and let formats implement it if available, or throw exception if not. Although there's further question of style of comments to use for formats that allow multiple styles (json does not have official comments, but there are 3 distinct styles that are sometimes used).

asomov commented 6 years ago

Even if Jackon finds a way to create API to emit comments, SnakeYAML explicitly does not support dumping comments. It is not supported by the YAML spec.

cowtowncoder commented 6 years ago

@asomov Hmmh. Ok, that would indeed be a blocker. I understand that retaining comments can be tricky (as they really do not play well with object models in general), wrt reading. But was assuming that there might be a way to generate them -- this was useful with XML, for example, adding comments as human-debuggable headers for things like "file generated by [tool name] on [date]" and such, instead of (or in addition to) actual machine-readable (meta)data.

asomov commented 6 years ago

@cowtowncoder : XML has comments as part of the spec. YAML explicitly ignores the comments.

cowtowncoder commented 6 years ago

@asomov I understand that. I just find this... unexpected. Thank you for pointing out point in specification.

Just one point however: while this clarifies that comments are essentially not to be exposed when reading YAML (or preserved within model), they clearly exist in textual representation. As such it would seem odd that generators could not produce them. Presumably that would mean that only humans with with text editors were to use them as... annotations?

Put another way: specification does not seem to have much to say on output/generation side, which is what this issue is about.

asomov commented 6 years ago

The spec says that the generation must be controlled by the composed Node tree. The comments are not part of the Node tree. You can see that as the white space. It is present and important but it is difficult to control while dumping the output. Of course, if a parser (jackson) could find a way to dump the comments it will be NOT against the spec.

cowtowncoder commented 6 years ago

@asomov Ok. I am not familiar enough with the specification there -- it is rather big spec, after all. Thank you for pointers so far.

Another way to view this would also be whether any other YAML tools allow emission or not. I tend to prefer working similarly to existing tools, and if (for example) such generation is not supported elsewhere, would not want to be introducing such non-compliant behavior.

At this point it does sound like adding such output may not be a good idea.

asomov commented 6 years ago

I do not know any tool which can generate comments.

GotoFinal commented 6 years ago

@asomov @cowtowncoder Are you sure that comments can't be included? http://yaml.org/spec/1.1/#id863676 There seems to be part of representation tree, even more, you can include whole styling here. For sure you can't read this, but you can include this. Note that you can include that outside the node, just pass additional argument to emitter with data about comments/style, in my projects I had special object to store comments mapped by path.

asomov commented 6 years ago

Where do you see that they are a part of the tree? This is what the spec says:

Comments are a presentation detail and must not have any effect on the serialization tree or representation graph. In particular, comments are not associated with a particular node.

" just pass additional argument to emitter with data about comments/style" - well, if you have a proposal how to implement that then feel free to share it with us.

GotoFinal commented 6 years ago

@asomov you ofc need own emitter then, as SnakeYaml does not support anything like that, so it is probably too complicated to include and support inside jackson

asomov commented 6 years ago
  1. this is the issue tracker for jackson
  2. SnakeYAML is not written in stone, it can be changed if we know how.
  3. You said there is already a project which emits comments. Why not to share with us the API ?
GotoFinal commented 6 years ago

@asomov this was my "private" project and internal code - so I didn't really care about being compilat with yaml specification, I just needed comments, so it was more like a hack. So it is not anything you would want to use in more popular and used project :)

asomov commented 6 years ago

Than I do not get the reason to start the conversation here (with the references to the spec).

GotoFinal commented 6 years ago

I just wanted to throw few ideas here (like passing that comments object next to nodes, as this seems to be fine with specification?), as once I also needed feature like that but needed to do it alone i a bit hacky way as there isn't any ready-to-use solution for generating nice configuration files with comments, but would be nice to finally have some better library for this. Sadly jackson can be problematic anyway with yaml, as after all yaml have many features that don't map well to json model, like object keys.

Sorry then :)

cowtowncoder commented 6 years ago

On plus side, I think there is agreement that:

  1. Support for writing comments, if any, makes most sense on SnakeYAML side -- it handles all low-level generation and decoding
  2. If such support is added, Jackson can probably be modified to take advantage: there are other data formats with support for comments (XML has them as legal thing; JSON has non-standard extension; CSV similarly has means to embed)

I think I will leave this open as a placeholder just in case someone else was looking for this, so it is easy to see our current thinking.

asomov commented 6 years ago

I just wanted to throw few ideas here (like passing that comments object next to nodes, as this seems to be fine with specification?), .

Sorry then :) Without any API proposal this idea is completely unclear for me. What is "that comments object" ?

GotoFinal commented 6 years ago

@asomov I don't know if there is any reason for me to describe this, as this was created for simple configuration files, I didn't need to care about many corner cases.

in my code I just created CommentsDocument object that was storing header, footer and comments data for each node described by path inside yaml structure, like: some.object.field with comment hello to get simple yaml:

some:
  object:
    # Hello
    field: ...

For lists I was just ignoring them, so it would also work for:

some:
  object:
  - 
      # Hello
      field: ...

And for nested objects in other maps I had a wildcard token: some.other.*.node

Additionally there was a class that was able to read comments from annotations over classes and include them to final document while/before serializing.

But my code was only designed for simple configuration files, and it just needed to work ;) PS: nice feature is to have option to only include comment once, like having this same comments in each list element might be pretty ugly, but again: I only did that for configuration files so I still think only about that case.

asomov commented 3 years ago

@cowtowncoder SnakeYAML Engine is going to support dumping comments. I wonder if we can somehow use it in Jackson. @lovasko @GotoFinal @IhToN can you try to marry the new feature in SnakeYAML with Jackson ?

cowtowncoder commented 3 years ago

Adding support may be tricky as a general feature, although adding methods in YAMLGenerator should be relatively easy. Not sure how this could be abstracted at higher levels though.

asomov commented 3 years ago

@cowtowncoder if no one wishes to contribute, than the feature is not that useful. We have done our part and we are ready to help the volunteers to make the dream come true.

cowtowncoder commented 3 years ago

@asomov I appreciate your adding functionality that makes it possible, and will be happy to at least add one mechanism on YAMLGenerator. Just wondering how this could be generalized: handling of comments in data formats is generally tricky, but also useful: XML format module would benefit from similar mechanism, for example.

Marking as "good first issue" in case someone has time to just do the simple thing first.

asomov commented 3 years ago

@cowtowncoder unfortunately I cannot help you. The support for comments in SnakeYAML ends on the Node level (before Java instances come into play). I am not sure it can be generalized to be used in Jackson. You can check the folder with tests to see how comments can be used: https://bitbucket.org/asomov/snakeyaml-engine/src/master/src/test/java/org/snakeyaml/engine/v2/comments/

Important: support for comments is not yet released ! I wanted to give it a try. May be we will need to change the API. This is exactly want I expected - somebody could try SnakeYAML before we release the new API

cowtowncoder commented 3 years ago

That level makes sense. If I understood it right with a quick look, emission of comments would need Jackson to associate comments with other node types to emit (like value that is written) next. That's probably quite doable. Still, would be good to know what users are sort of trying to achieve. For example, ability to add "header comments" is a common use case for at least xml -- add a note about time file was generated, for example.

jdimeo commented 3 years ago

I am very interested in this but am not sure I have the time to do a contribution. Thanks for keeping it alive.

Charlinjoeaht commented 2 years ago

Even today, I am looking for this solution. Let me know if anyone finds any solution to achieve this

nikhildigde commented 2 years ago

Hi All, Thank you for this informative discussion. I am too looking for this feature in snakeyaml & it is definitely a good thing to have. For example even microconfig uses comments to mark / annotate things in YAML. When there is merging of YAML files, it does create issues since comments are erased in the final outcome.

@asomov - I cannot access this https://bitbucket.org/asomov/snakeyaml-engine/pull-requests/1 .. Can you provide access to this ? Also you mentioned that snakeyaml is going to add support to dump comments - can you please point me to some reference related to this?

nikhildigde commented 2 years ago

Btw, I just found this -- https://bitbucket.org/snakeyaml/snakeyaml/issues/497/how-to-use-the-new-process-comments .. Havent tried it yet, but seems like something was done around this. Can you please confirm

asomov commented 2 years ago

@nikhildigde SnakeYAML Engine migrated to the new home The latest release supports comment. Soon there will be another release with minor improvements

cowtowncoder commented 2 years ago

I think I did notice that SnakeYAML engine now has means to write (and probably also read?) comments. That would be a required part of the solution.

But beyond that there'd still be the question of how to expose comments in Jackson content (f.ex as JsonNode). Comments are challenging since they typically can appear anywhere; formatting matters more (whitespace aspects around comments should ideally be retained) and so they do not fit well with many data models.

I think that conceptually comments should probably be associated as sort of metadata to actual content nodes but Jackson really has nothing to support such notion. Adding new value nodes would work for some comments but not nearly all.

Supporting comment reading/writing at low level (streaming parsers/generators) is simpler, of course, and that may be the first thing to allow. It won't be super useful for most users but would at least let toolmakers do something.

jdimeo commented 2 years ago

I am now using the underlying SnakeYAML to read/write commons (thanks @asomov !)

My use case involves marking a translation as being proofread/checked, or capturing other metadata about the value. For example this shows that translator SC has looked at the Spanish:

description:
  en-US: Injury Cause
  sw-TZ: Sababu ya Kuumiza
  id-ID: Penyebab Cedera
  es-SV: Causa de la lesión #SC Check

... and then these files are read via Jackson (thanks @cowtowncoder !) and become quite a comprehensive outcomes framework for monitoring & evaluation use cases.

But there are other things that are driving me to use the lower level, like preserving any indentation or style that the last human used and omitting fields with defaults (Jackson writes enabled: true, as it should, but that clutters the file for humans). So while I want comment handling in Jackson (so I don't bypass the APIs and go down to SnakeYAML APIs), but I don't really have a clear proposal- I see a similar dilemma about how to include them in the node tree...

nikhildigde commented 2 years ago

@nikhildigde SnakeYAML Engine migrated to the new home The latest release supports comment. Soon there will be another release with minor improvements

@asomov Yes I later had found this repo and did a local build of the master. Seems working great! Thanks! Just a question, why did you not enhance SnakeYaml 1.1 impl rather than creating a new impl (engine). Intent of asking is to not have a BWC break and same experience

asomov commented 2 years ago

@asomov ... Just a question, why did you not enhance SnakeYaml 1.1 impl rather than creating a new impl (engine). Intent of asking is to not have a BWC break and same experience

To be able to improve it breaking the backwards compatibility

kdubb commented 1 year ago

Just adding a real world use case here. RAML documents use a shebang like identifier #%RAML 1.0 to identify the YAML as RAML; I believe I have seen other formats do similar.

We're using Jackson's YAML merge functionality to build complete documents for OpenAPI and RAML. With RAML we need to "fixup" the merged YAML to include the RAML comment header. It would obviously be nice if comments could be preserved since we are only using JsonNodes during this process.

cowtowncoder commented 1 year ago

While (typed) comments would be nice, another possibility could be -- if SnakeYAML/-engine supports this -- to pass through "raw" content. Currently it is handled in rather odd way; basically you need POJONode, which then contains RawValue as "POJO" (basically, opaque value to embed directly in outgoing content). With JSON this would get embedded right in output as text; and same could theoretically at least work with YAML. The big "if" is wrt ability for YAML backend to write it via writeRawValue() (or just writeRaw(), depending).

The reason this would be slightly easier (excluding part about SnakeYAML support) is that no new types would be needed.

solonovamax commented 8 months ago

Hi, if I were interested in contributing to jackson to implement this (not just for yaml, but also for other formats as well), what would be the best way to go about starting this? Is there, for example, a matrix space I could join to discuss this?

cowtowncoder commented 8 months ago

@solonovamax I doubt there is any discussion outside of this issue (except for potentially discussion on XML backend issues, jackson-dataformat-xml).

Challenges here include:

  1. How tightly to integrate functionality: easiest way is to only add support to YAMLGenerator (method to write a comment) -- this is probably the starting point
  2. Much harder: somehow expose functionality through jackson-databind: there is no API for doing that, so not clear how it should be done. Not all formats support comments; and those that do generally allow them to appear about anywhere within data. Does not fit with many object models (f.ex Jackson JsonNode would be tricky)
  3. Even harder: expose comments on reader (deserialization) side, mostly to allow read-edit-write cycle

Still, tackling (1) would be the first step I think. It could potentially include adding writeComment(String) method in shared JsonGenerator (default implementation likely throwing exception saying format does not support comments), adding actual implementation for supported backends (YAML, XML, JSON (if and only if enabled with non-standard JsonWriteFeature), Ion (I think?) )

jdimeo commented 8 months ago

<joke> Use ChatGPT to infer the location of where I want the comments to be in the data </joke>

jdimeo commented 8 months ago

OK back to something actually useful: How crazy would it be to support something like @JsonComment even on JsonNode and then add annotations to POJOs and even dynamically at runtime via https://github.com/renatoathaydes/javanna (which uses proxies under the hood)

solonovamax commented 8 months ago

regarding the issues you raised, here's what I was thinking:

  1. How tightly to integrate functionality: easiest way is to only add support to YAMLGenerator (method to write a comment) -- this is probably the starting point

I'd ideally like to be able to implement global support across all generators for it.

  1. Much harder: somehow expose functionality through jackson-databind: there is no API for doing that, so not clear how it should be done. Not all formats support comments; and those that do generally allow them to appear about anywhere within data. Does not fit with many object models (f.ex Jackson JsonNode would be tricky)

I wasn't particularly considering databind as I would most likely not be using it for my specific usecase (kotlin delegate properties which store the name of their field) however, for databind, one solution would be to introduce smth like @Comment(String), which would write a comment before the field is written. it would not support writing comments after the field.

and, any formats which don't support comments would just ignore it.

and, imo, there should be zero support for deserializing the comments into a POJO. comments are not data, only a bit of metadata, generally for a human reading it.

  1. Even harder: expose comments on reader (deserialization) side, mostly to allow read-edit-write cycle

I don't particularly see them as being super useful to be exposed to the reader, as they're comments. the contents of them does not matter.

however, I do think that they should be stored in the parsed AST (JsonNode), that way if you deserialize -> serialize a config file (to a JsonNode. it would not work with databind, as the comments are ignored when deserializing to the POJO), the comments are preserved.

I was thinking smth like hasComments(): Boolean (defaults to return false in unsupported formats), and getComments(): List<String>, returning the comments that are above a particular node. additionally, to add comments to the JsonNode, you'd either call addComment(String) or replaceComments(String)/replaceComments(List<String>). the comments returned by getComments() would be a list, which each element in the list is the comment on one line. so, the (yaml) comments

# foo
# bar

would return a list of [" foo", " bar"] (note: leading & trailing whitespace is preserved.) addComment(String) would add a comment to the end of the list, replaceComments(String)/replaceComnents(List<String>) would replace the entire list of comments. both of them will accept multi-line strings, and will split the strings before it adds them to the list.

Still, tackling (1) would be the first step I think. It could potentially include adding writeComment(String) method in shared JsonGenerator (default implementation likely throwing exception saying format does not support comments), adding actual implementation for supported backends (YAML, XML, JSON (if and only if enabled with non-standard JsonWriteFeature), Ion (I think?) )

rather than hard-failing on attempting to write a comment in an unsupported format, it could soft-fail and just return without it being written. as, the comment is (technically) "optional metadata", so nothing has actually gone "wrong" (per-se) if it is not written.

cowtowncoder commented 8 months ago

Yeah; I think there are many limited ways to support comments that could be useful. Support for JsonNode would be quite difficult in a way that would not change immutability aspects of some types (BooleanNode, NullNode are stateless, immutable) and add memory overhead for all users, so not sure I would want to consider that.

But we should start with the streaming write (JsonGenerator) and maybe read (JsonParser).

solonovamax commented 8 months ago

Yeah; I think there are many limited ways to support comments that could be useful. Support for JsonNode would be quite difficult in a way that would not change immutability aspects of some types (BooleanNode, NullNode are stateless, immutable) and add memory overhead for all users, so not sure I would want to consider that.

ah, hmm, that is actually a good point.

cowtowncoder commented 8 months ago

Thinking out aloud, of course alternate direction wrt comments, JsonNode would be to have comment aspect as wrapper (CommentNode or maybe... AnnotatedNode that has comment and then real node it is associated with as sort of content). That wouldn't work well with existing code, but conceptually might make sense.