godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
90.17k stars 21.19k forks source link

Proposal to add YAML support in addition to JSON #4354

Closed ghost closed 4 years ago

ghost commented 8 years ago

Would be nice if we had the option to use YAML as an alternative to JSON. More versatility would be great.

Also seems YAML can do things JSON cannot like : comments, extensible data types, strings without quotation marks and others. And , according to Wikipedia, JSON can be parsed by YAML (ver 1.2) parsers making it a subset of YAML. So at second glance both of these are interchangeable and YAML does seem easier to write, read and more powerful than JSON.

Obvs this is a low priority feature I will leave it up you to decide

Thanks. Keep up the good work 👍

hubbyist commented 8 years ago

Having comments will be good for temporarily adding and removing data blocks to game data like scenario and stats for testing easier. Also adding todo and programmer notes will be a plus. As a stepping stone https://hjson.org/ maybe used as well, since there is a Json parse readily available in godot.

ghost commented 8 years ago

@hubbyist I would still prefer YAML not just for the fact that is readable but it can also do weird things like define variable data types, extend data types, functions etc. Essentially it can be used not just for data but for modding or scripting a game. But HJson is easier to read and would be very much preferred over regular JSON. I didn't know about it thank you.

kingthrillgore commented 7 years ago

I think YAML adds value for config files strictly due to commenting. +1ing this.

Calinou commented 7 years ago

I think YAML adds value for config files strictly due to commenting. +1ing this.

Please do not needlessly bump issues when showing support for a feature request, use the 👍 reaction button on the original post instead.

naturally-intelligent commented 6 years ago

I'm also interested in an alternative to JSON. I'm wondering why a save format doesn't even support integers. For example, saving an array with integer indexes to JSON, then loading them, converts those indexes into strings. Leaving me to either: do convoluted loading reconstruction of arrays/dicts, or use strings everywhere instead of integers. Both are bizarre workarounds. If YAML supports integers, it should be a thing.

erodozer commented 6 years ago

Only keys and strings require quotes in proper json, arrays of ints should save/load just fine.

vnen commented 6 years ago

For example, saving an array with integer indexes to JSON, then loading them, converts those indexes into strings.

I'm not sure what you mean by that. Arrays in JSON don't store the indexes and Godot's arrays don't support string indexes (only dictionaries do), so I don't even know what to make of the sentence. Can you can provide a simple example of those conversions so we can assess if it's a bug?

vaiktorg commented 6 years ago

What about https://msgpack.org? Looks great for network and data compression

erodozer commented 6 years ago

@Vaiktorg msgpack is for serializing messages in a compact binary-like format, this discussion has been more for supporting human readable and editable file formats that can be used as resources and loaded from the file system.

buckle2000 commented 6 years ago

YAML is much easier to write than JSON. Also, config files can be written in YAML.

ghost commented 6 years ago

does YAML have redundant keys like JSON or?

@Vaiktorg yeah, should be a separate thread. but msgpack is good for network stuff

buckle2000 commented 6 years ago

@gring what do you mean? YAML is a super-set of JSON.

ghost commented 6 years ago

@buckle2000 well, if the redundant keys are still there, unfortunately it's hard for me to justify using YAML over json then. i was thinking it was more of a CSV parser, which is much better for large data storage & import. which i am in favor for. faster, no redundant keys, etc. which align perfectly w/ game development :)

buckle2000 commented 6 years ago

@Gring YAML is really cool. For instance, it supports reference and binary data (base64-encoded). YAML is ergonomic (easy to write) and for versatile config files. It is not for large data storage.

ghost commented 6 years ago

@buckle2000 json is not for large data storage either, which is why i was trying to think of a reason to justify YAML over it. as a CSV parser would be the most beneficial for game devs. and have a real use case (importing large data, no redundant keys, faster, and a much lower filesize).

@naturally-intelligent For example, saving an array with integer indexes to JSON, then loading them, converts those indexes into strings

i actually just did some testing.

var json = """
{"my_keys": [1,2,3,4,5]}
"""

func _ready():
    var obj_literal = JSON.parse(json).result["my_keys"]
    for hi in obj_literal:
         print(typeof(hi) == TYPE_INT)

this actually prints False. that's what he was saying when he was referring to the indexes (@vnen).

but it can also do weird things like define variable data types

if YAML can define variable types, then i am willing to take back my original statement and say yes, it could have an advantage over regular json. the YAML gdscript parser will need to accommodate for the different types however

buckle2000 commented 6 years ago

@gring you can specify types explicitly in YAML. Please read the specs yourself. http://yaml.org/spec/1.2/spec.html#id2761292

vnen commented 6 years ago

@girng well, those aren't indexes, they are values. And this will always print false because they are imported as floats and never as ints (because the JSON spec only has a number type, it does not have any distinction between integers and real numbers). But they are not imported as strings, like @naturally-intelligent declared in the report.

Also, YAML is suggested as a replacement to JSON (since it's a super-set anyway). It's not a new format to fill some other gap if there's any. YAML is easier for a human to read and write than JSON, hence the proposal. CSV is not that good for large storage either and you can already import it using the File API. CSV is also mostly suitable for homogeneous data, while YAML/JSON works fine with ad-hoc structures.

ghost commented 6 years ago

@vnen CSV is great for large storage because it doesn't use redundant keys. it also loads/parses much faster (which is essential for startup/load times). its also been proven in the industry with many games (some even AAA)

they are definitely values i agree, but some people refer to them as indexes too, which i was just trying to show what he meant

@vnen, YAML seems better then. especially if we can define custom types in there, and not have to worry about traversing after the data gets loaded, just to convert to the type it was. i have changed my vote, but i do urge you to look into a CSV.parse method (similar to JSON.parse) as it will have much more of a use case in the game dev world imo.

ghost commented 6 years ago

@girng Without trying to say the same things that have been said before yaml is a superset of json and yaml parsers are also able to parse json. So in other words yaml does everything json can + extra. Also yaml makes use of indentation instead of {} much like python and gdscript so it fits way better with godot's philosophy.

viktor-ferenczi commented 6 years ago

Yaml is more friendly to source control than JSON, which would make yaml ideal not only for configuration files, but all textual project files (.tscn, .tres, etc.).

Imagine a JSON dict serialized with sorted keys and indentation like:

"data": {
    "a": 1,
    "b": 2
}

Let's append item "c":

"data": {
    "a": 1,
    "b": 2,
    "c": 3
}

The diff will include two modified lines, since a comma must be added at the end of the "b" line. This does not seem like a big problem with these simple values, but imagine if "b" has a 1000 characters long string as value. It would pollute a patch with unrelated data just because of a single comma at the end not even visible on screen.

Always adding a comma after the last item would fix it, but that's not possible due to the JSON specification.

Same data serialized in yaml:

data:
  a: 1
  b: 2
  c: 3

Adding item "c" would show up as a single line of change in a diff (patch), which is more readable and includes information only on the change itself.

Considerations for maxing out source control friendly behavior by minimizing diffs:

ghost commented 6 years ago

Hey guys I don't want to needlessly bump this thread but I feel this feature is probably the most supported feature request right now. I hope devs will consider this at some point. I think YAML fits perfectly with GDScript's "pythonesque" tab based philosophy.

vnen commented 6 years ago

If you really want this I suggest not to depend on the main devs and instead try to make a pull request. I probably would make use of this too, but I don't need it enough to justify spending time on it right now.

OvermindDL1 commented 6 years ago

I'd highly vote on no yaml, it is a way overdesigned and broken spec. If it were to be added then which of its ambiguous setups would be used? I'm pretty sure TOML is taking over YAML's space, being conceptually similar and similar in syntax while being simple and shorter to parse without a broken spec.

(And as an aside, yaml not supporting tabs for indentation is wtfery, Tabs for Indentation, Spaces for Alignment... >.<)

Although I prefer the HOCON format from a readability perspective, it is fantastic for human-written things, though a bit 'interesting' to implement.

EDIT: Though as another thought, just using lua itself 'as' the configuration format I've seen done quite a bit lately...

naturally-intelligent commented 6 years ago

@gimg is right I meant to say dict keys, I was new to gdscript at the time. I get data like this when I serialize to json:

{"characters":{"1":{"approved":false,"denied":true,"id":1,"name":"Richard Fennel"},"2":{"approved":false,"denied":true,"id":2,"name":"Margaret Gerblignzer"}

And then when I'm loading I have to reconstruct the dict in annoying fashion, looping through the data structure again. I shouldn't have to do that, it should just rebuild it.

vnen commented 6 years ago

@naturally-intelligent that's normal because you have strings on the JSON (the numbers are surrounded by quotes). In JavaScript you'd get the same as GDScript, the difference is that JS doesn't care about the types and can mix strings with numbers without issues, while GDS won't let you add a string to a number.

That's not something that can be solved if we follow the JSON specification. You need to serialize the data properly or deal with type-casting when reading.

naturally-intelligent commented 6 years ago

@vnen yep I just wasn't expecting it. JSON may not be broken but it also means it isn't fully compatible with the datastructures of GDScript. I've stopped using integers as keys in GDScript because JSON doesn't support it. It's not worth the trouble when it comes time for save/load. I can just convert the string key to integers in code when I need to. But it feels hacky to not use a GDScript language feature because of JSON. Doesn't seem right to me. It's a "gotcha" that is not clear at all to the new developer

viktor-ferenczi commented 6 years ago

Attempting to collect the requirements:

Any suggested additions/changes?

After finalizing the requirements we can create a list of candidate formats with their pros/cons.

ghost commented 6 years ago

Is too bad @vnen is not onboard with this idea because he has a lot of sway among devs. So I feel since he wont support this idea no matter how many likes might never be implemented unless a contributor will implement it.

@viktor-ferenczi I think the overall point here is yaml is much more in line with Python and GDScript. Json to me feels more like Java.

vnen commented 6 years ago

Well, everybody has some power of influence. The current proposals form is open, and there's one every month. If people add this as a suggestion and it is well voted, it'll certainly bring attention (while only patrons votes are "considered", everybody can vote and global vote stats are always shown, so it still makes an impact).

I didn't say I'm against this, I just said that I personally won't work on this anytime soon, since it's not a particular need of mine right now.

Goutte commented 5 years ago

Thanks @viktor-ferenczi for the excellent list of requirements !


Note that @Beliaar has made a YAML plugin for Godot.

GammaGames commented 5 years ago

Json to me feels more like Java.

Funny, because JSON stands for JSON JavaScript Object Notation.

I also would appreciate this addition. I would like to export a web build of one of my projects, but I can't because GDNative doesn't work in html5 exports yet. This would solve that problem.

Calinou commented 5 years ago

Note that the ConfigFile format partially matches TOML, which makes it possible to parse and write simple TOML files. The main issue is that comments starting with # won't be parsed correctly, as only ; can be used for comments. We could solve this by also allowing # for comments in addition to ;.

viktor-ferenczi commented 5 years ago

Main points of this change:

Anutrix commented 5 years ago

So is someone working on it? vnen seems to be ok with this. So how and what do we plan to do?

  1. Do we make all the changes from JSON to YAML or just make a json2yaml convertor? Note: If latter, then we could have a toggle in settings.
  2. What files would need changes and which part/class of this should be implemented?
  3. Anybody have any more suggestions or plans?
  4. Anybody up to working on listing out the candidate formats as suggested by @viktor-ferenczi?
Calinou commented 5 years ago

@Anutrix Adding support for YAML likely means pulling in a YAML library, since it's a very complex format.

Also, we don't really use JSON much in the engine itself, except for storing things like editor feature templates. This kind of file is fine as JSON, since they're not meant to be edited directly by hand.

OvermindDL1 commented 5 years ago

And each YAML library parses slightly differently because the format is ambiguous, be careful about choosing one that works well and doesn't have security concerns that some have. YAML really is not a good format...

Goutte commented 5 years ago

Anybody have any more suggestions or plans?

Just dropping these here for inspiration, no plans (yet?) to implement them.

It's often useful to provide "variables" to configuration, such as paths for example, like Symfony does : %kernel_path%. Godot has some paths and config it could provide. Injecting (replacing) the variables could be the responsibility of a wrapper of the YAML component?

Also, a function for batch-loading multiple YAML files or directories, in order (last wins) into a single object would allow plugins and modules to expose YAML configuration that the user may override with its own.

Anutrix commented 5 years ago

Didn't know something like this existed. https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats I know Godot has its own requirements for this but it could be a start.

GammaGames commented 5 years ago

I think only supporting a simplified spec of yaml would be sufficient for 99% of use cases, something similar to StrictYAML. That is, if we don't just use an existing library.

They also have a note against TOML, but I don't only want yaml for config files. It's great as a basic format that does all the same stuff as JSON but is way cleaner to read and write.

hasahmed commented 5 years ago

I think only supporting a simplified spec of yaml would be sufficient for 99% of use cases, something similar to StrictYAML. That is, if we don't just use an existing library.

They also have a note against TOML, but I don't only want yaml for config files. It's great as a basic format that does all the same stuff as JSON but is way cleaner to read and write.

I like the idea of supporting a strict subset of YAML.

Links broken. Here is a github page https://github.com/crdoconnor/strictyaml

viktor-ferenczi commented 5 years ago

I agree that StrictYAML would be a good candidate.

Are there any libraries to work with StrictYAML in C++ (or plain C) code? That would be essential for Godot integration.

The implementation linked by @hasahmed above is a Python one. (Good for build system and scripting.)

bojidar-bg commented 5 years ago

That would be essential for Godot integration.

Not really, given that the JSON implementation in Godot is fully custom.

OvermindDL1 commented 5 years ago

Not really, given that the JSON implementation in Godot is fully custom.

JSON is also an extremely simple single-page spec, where YAML's spec is thousands of pages and still ambiguous...

Gnumaru commented 5 years ago

IMHO, the only problem with hand editing json is that there is too much stiffness in the standard. Mandatory quotes for dictionary keys and string values, and absence of comments and the incapacity of ending arrays and dictionaries with a trailing comma is the main reason that makes json harder to write than yaml. Of course this could be solved outside the engine using external tools to convert from yaml/hjson/whatever to json, but would not solve for inline yaml inside other files or generated at runtime or gotten from outside sources (via an http request, for example).

Also, IMHO, yaml is a great data storage and transmission language, despite some problems here and there. Mainly for the better raw text readability, "writability" and ability to write comments, but also for other features anchors and references which are great. But writing a yaml parser from scratch would probably be impracticable and adding a third party parser maybe would not be optimal.

Suppose someone had the time and interest to modify the json parsing logic at core/io/json.cpp so that it could parse single line comments, trailing commas on arrays and dictionaries and dictionary keys and strings without quotes. Would this be a viable alternative? It would be a json superset, like yaml or hjson, but with the absolute bare minimum of features to allow for better readability, "writability" and "diffability", which would be only the three abovementioned features. Even if it is a viable alternative, would it be preferable than supporting and industry standard format with lots of features like yaml? with the bonus that, being a superset of json, the yaml parser could replace the json one? I know the json parser is custom made, it would take some work to get an external library and adapt it for godot needs, but it is still and option.

Beliar83 commented 5 years ago

[...] Suppose someone had the time and interest to modify the json parsing logic at core/io/json.cpp so that it could parse single line comments, trailing commas on arrays and dictionaries and dictionary keys and strings without quotes. Would this be a viable alternative? [...]

I would be quite content with those things being added.

Calinou commented 5 years ago

@Beliaar The issue with doing this is that it would make the JSON parser non-standard. If this is to be done, it must be an opt-in setting, not the default.

Also, again, for hand editing, ConfigFile would be more suited overall :slightly_smiling_face:

viktor-ferenczi commented 5 years ago

ConfigFile:

GammaGames commented 5 years ago

Also, IMHO, yaml is a great data storage and transmission language, despite some problems here and there. Mainly for the better raw text readability, "writability" and ability to write comments

That is why I think it would be a good addition, writing json is tedious and it would be nicer to store data in an to write format that doesn't need an external tool. For example creating an item dictionary would be really simple for anyone to read and modify, as opposed to json that you would need to understand its format.

But writing a yaml parser from scratch would probably be impracticable

That's why I proposed strictyaml, only supporting a subset of features that would fit most use cases. I would argue that anchors shouldn't be allowed in a stricter syntax, because they decrease readability

Suppose someone had the time and interest to modify the json parsing logic at core/io/json.cpp so that [...] Would this be a viable alternative?

IMO it would not be. YAML has a couple extra features that aren't supported in JSON, one example being block scalars to break text into multiple lines.

Gnumaru commented 5 years ago

There seem to be no standard specification for the format, but it is reasonably simple

A standard specification is usually good, but what is more important IMHO are de-facto standards and already wide-spread implementations. Suppose the first most used 'dialect' of config file have, say, 50% of general use and the second most used have 20% or less usage, than the first dialect is good enough to be the one used and the only one supported.

The problem with, let's say, the configfile python implementation is that it does not support nested structures, it cannot represent object trees properly. Any non scalar value is written to the file as single line string, a python array or dictionary literal, and must be manually parsed on code after reading the configfile. Suppose this is the most widespread dialect of config file, then it would not be suitable for our purposes. One can manually put newlines inside the dictionary literal and still have it be parsed in python as long as one ident the next lines with at least one space or tab, but it seems there is not pretty-print for the dictionary literals using the standard configparser python module

Still talking about python implementations, there is a package named configobj wich implements a configfile dialect properly supporting nested structures without having to rely on strings with dictionary literals. This dialect is much more suitable for our needs.

This is a test script for comparison test.py:

data = {}
data['anint'] = 1
data['afloat'] = 1.2
data['abool'] = True
data['anone'] = None
data['astring'] = 'asdf'
data['anarray'] = [1, 2, 3]
data['anotherarray'] = [',', ',', ',']
data['andict'] = {'a':1,'b':1.0,'c':True,'d':None,'e':'asdf','f':[1,2,3]}
data['nesteddict'] = {'j':1,'a':{'j':1,'b':{'j':1,'c':1}}}

import configobj # external pypi package: pip install configobj
config = configobj.ConfigObj()
config['root'] = data
config.filename = 'configobj.ini'
config.write()

import configparser # part of python standard library
config = configparser.ConfigParser()
config['root'] = {'key':data}
with open('configparser.ini', 'w') as configfile:
    config.write(configfile)

This is the output of the configobj module from the external package 'configobj' configobj.ini

[root]
anint = 1
afloat = 1.2
abool = True
anone = None
astring = asdf
anarray = 1, 2, 3
anotherarray = ",", ",", ","
[[andict]]
a = 1
b = 1.0
c = True
d = None
e = asdf
f = 1, 2, 3
[[nesteddict]]
j = 1
[[[a]]]
j = 1
[[[[b]]]]
j = 1
c = 1

And this is the output of the standard configparser module configparser.ini

[root]
key = {'anint': 1, 'afloat': 1.2, 'abool': True, 'anone': None, 'astring': 'asdf', 'anarray': [1, 2, 3], 'anotherarray': [',', ',', ','], 'andict': {'a': 1, 'b': 1.0, 'c': True, 'd': None, 'e': 'asdf', 'f': [1, 2, 3]}, 'nesteddict': {'j': 1, 'a': {'j': 1, 'b': {'j': 1, 'c': 1}}}}

And for comparison, this is how windows' registry editor exports nested structures

Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\*\shell\removeproperties]
"ProgrammaticAccessOnly"="Apartment"

[HKEY_CLASSES_ROOT\*\shell\removeproperties\DropTarget]
"CLSID"="{09a28848-0e97-4cef-b950-cea037161155}"
Gnumaru commented 5 years ago

I should have also compared the toml output for the same data. Note that toml is the only dialect (in these specific implementations at least) where it is possible to put variables outside sections. Also toml nesting levels are much more readable than configobj

This is the test script for comparison test.py:

data = {}
data['anint'] = 1
data['afloat'] = 1.2
data['abool'] = True
data['anone'] = None
data['astring'] = 'asdf'
data['anarray'] = [1, 2, 3]
data['anotherarray'] = [',', ',', ',']
data['andict'] = {'a':1,'b':1.0,'c':True,'d':None,'e':'asdf','f':[1,2,3]}
data['nesteddict'] = {'j':1,'a':{'j':1,'b':{'j':1,'c':1}}}

import configobj # external pypi package: pip install configobj
config = configobj.ConfigObj()
config['root'] = data
config.filename = 'configobj.ini'
config.write()

import configparser # part of python standard library
config = configparser.ConfigParser()
config['root'] = {'key':data}
with open('configparser.ini', 'w') as configfile:
    config.write(configfile)

import toml # external pypi package: pip install toml
with open('toml.ini', 'w') as configfile:
    toml.dump(data, configfile)

This is the output of the toml module from the external package 'toml' toml.ini

anint = 1
afloat = 1.2
abool = true
astring = "asdf"
anarray = [ 1, 2, 3,]
anotherarray = [ ",", ",", ",",]

[andict]
a = 1
b = 1.0
c = true
e = "asdf"
f = [ 1, 2, 3,]

[nesteddict]
j = 1

[nesteddict.a]
j = 1

[nesteddict.a.b]
j = 1
c = 1

This is the output of the configobj module from the external package 'configobj' configobj.ini

[root]
anint = 1
afloat = 1.2
abool = True
anone = None
astring = asdf
anarray = 1, 2, 3
anotherarray = ",", ",", ","
[[andict]]
a = 1
b = 1.0
c = True
d = None
e = asdf
f = 1, 2, 3
[[nesteddict]]
j = 1
[[[a]]]
j = 1
[[[[b]]]]
j = 1
c = 1

This is the output of the standard configparser module configparser.ini

[root]
key = {'anint': 1, 'afloat': 1.2, 'abool': True, 'anone': None, 'astring': 'asdf', 'anarray': [1, 2, 3], 'anotherarray': [',', ',', ','], 'andict': {'a': 1, 'b': 1.0, 'c': True, 'd': None, 'e': 'asdf', 'f': [1, 2, 3]}, 'nesteddict': {'j': 1, 'a': {'j': 1, 'b': {'j': 1, 'c': 1}}}}

And for comparison, this is how windows' registry editor exports nested structures

Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\*\shell\removeproperties]
"ProgrammaticAccessOnly"="Apartment"

[HKEY_CLASSES_ROOT\*\shell\removeproperties\DropTarget]
"CLSID"="{09a28848-0e97-4cef-b950-cea037161155}"