TeamPorcupine / ProjectPorcupine

Project Porcupine: A Base-Building Game...in Space!
GNU General Public License v3.0
483 stars 280 forks source link

[Discussion] Switching our data format language. #1034

Closed koosemose closed 7 years ago

koosemose commented 7 years ago

PLEASE VOTE YOUR CHOICE IN STRAWPOLL Data Format Poll : http://www.strawpoll.me/11156965

Surprisingly, we've never had a discussion directly about if we want to continue using XML as our data format or switch to something else. We've only discussed it in passing ( in #25 and #21 ). Our primary other candidate would be JSON as Unity already has some degree of support for it and further support could be achieved through SimpleJSON.

We have added outside libraries in the past when we needed to, so I don't think that should be a primary concern. Quills only recently stated concern was that we use one language for it, and that someone would need to volunteer to make the needed changes, stated in #25 (I don't claim that this is the entirety of his opinion on the matter, only that it is the only semi-recent statement on the matter). I know I, for one, would be willing to assist on making the changeover, if that's what's decided, and I'm confident that if we choose to do it, others will help, so I also don't think that should be a primary concern.

With JSON (via SimpleJSON) we wouldn't need to loop over the data structure, which can be error prone, such as if you skip reading a line and try to readToNextSibling from the wrong depth, you instead just check if the data you want exists and use it to populate the objects you need populated.

If interest is shown in switching to JSON we can do further investigation into what exactly it will mean for our data loading and how it will work for us.

Thoughts and opinions for or against?

Edit: An informal count of what people's preferences seem to be as of right now. This is only to give people a general idea of how people are feeling, and there are several I couldn't discern their opinion if any on the matter.

JSON XML YAML
@koosemose @crafty-geek @NogginBops
@vogonistic @Grenkin1988 @mikejbrown
@Dormanil @gunthergun
@Mizipzor @sboigelot
@MANDAL0R3 @dusho
@TomMalbran @powli
@mikejbrown @kd7uiy
@crafty-geek
@Ohda
@mbstraus
@bjubes
vogonistic commented 7 years ago

I would love to switch to JSON!

kd7uiy commented 7 years ago

There's at least one additional alternative, which I used in the recent commit I made. Fundamentally, there are 2 very different methods for reading XML in C# built in, XmlReader, and XmlDocument. I very much prefer the document method, as it has many of the same advantages as JSON. Take a look at #1033 for an idea of how it can be used. For my personal project, it's pretty much the only thing that I use.

koosemose commented 7 years ago

@kd7uiy I wouldn't consider that an additional alternative, as it's still using xml, but rather a way to make the parsing side of xml more palatable.

I prefer JSON due to the actual structure of the data files, I've always found xml an awkward and unsuitable language for data, with a lot of excess text from what should be needed just to store and easily retrieve data. Aside from it being what we already use, and it being what Unity supports more fully natively, I don't see what benefits all that extra stuff brings.

Edit: However, if we choose to stay with xml, I do think we should look at the XmlDocument method you use, as I find the XmlReader method subpar at best.

kd7uiy commented 7 years ago

JSON is great if you want a really compact data format, which makes it ideal for web APIs, for instance. I find it's readability to be lacking somewhat, however. I do agree that XML is a bit wordy too, especially when parsing. Sigh, data formats are just a pain, no matter how you look at it, I suppose.

koosemose commented 7 years ago

I find that the verbosity of xml hinders readability more than JSON's brevity does as xml the data is mixed in the structural elements, and sometimes can be partially concealed by it. The data is either wrapped in the tag of the semantics of the data (So the name of a furniture, for example, is write in the middle of the stuff telling you that it is the name), or it's alternatively part of what's otherwise telling us the semantics of the data (such as the jobTime of BuildingJob in furniture being an attribute). It's often unclear as to what exactly the data is and what's the semantics... though that may be Quill's fault and ours for continuing to not have any guidelines on what should be the contents of a tag and what should be an attribute.

Ultimately it comes down to what is preferred by the community, brevity or verbosity.

TomMalbran commented 7 years ago

I would love to switch to JSON, reading and writing would be really easy. If you remove the attributes from xml, which is easy, as you make them elements with a single value, you end with a format similar JSON, but where you need to place all the end tags.

I can help if required, and we can do incremental updates.

vogonistic commented 7 years ago

Here is a quick-ish conversion of furniture into JSON as an example of how it could look. I took the liberty of optimizing and changing where I thought it was appropriate. The changes aren't required for us to change to JSON, but why not now?

  1. ObjectType is now a better internal code
  2. TypeTag is an array
  3. CanReplaceFurniture -> CanReplaceTypeTags
  4. Actions now contains other delegates, like IsEnterable and GetSpriteName
  5. Actions can only be registered once, if you want two methods to run on a particular action, call them internally.
  6. Changed ContextMenuActions into an Action. The reasoning is that context actions are evaluated rarely but has a lot to benefit from reflecting the state.
  7. Removed LocalizationCode and UnlocalizedDescription, they should be ObjectType and ${ObjectType}_desc respectively
{
  "ObjectType": "door_airlock",
  "Name": "Airlock Door",
  "Description": "An airlock door prevents air from leaving the base",
  "TypeTags": [ "Door" ],
  "MovementCost": 1,
  "PathfindingModifier": 3,
  "PathfindingWeight": 1,
  "Width": 1,
  "Height": 1,
  "LinksToNeighbours": false,
  "EnclosesRooms": true,
  "CanReplaceTypeTags": [ "Wall", "Door" ],

  "BuildingJobTime": 5,
  "BuildingJobRequirements": {
    "Steel Plate": 10,
  },

  "DeconstructJobTime": 1,
  "DeconstructJobProvides": {
    "Steel Plate": 7
  },

  "Params": {
    "openness": 0,
    "is_opening": 0,
    "thermal_diffusivity": 0.00001
  },

  "Actions": {
    "OnUpdate": "OnUpdate_AirlockDoor",
    "IsEnterable": "IsEnterable_Door",
    "GetSpriteName": "GetSpriteName_Airlock",
    "ContextMenuActions": "ContextMenuActions_AirlockDoor"
  }
}

There are lots of values here that could be removed since they are default values, but I didn't want to include that now.

Ohda commented 7 years ago

Does YAML is not an option here ? It like it for is readability.

Grenkin1988 commented 7 years ago

@vogonistic How <PowerConnection/> would look like with its attributes?

If we are going for this why than some strange SimpleJSON, why not to use Json.NET

koosemose commented 7 years ago

@Grenkin1988 I'm not so much suggesting that we definitely use SimpleJSON, more a possibility way to do it to cover something better than Unity's default JSON parser. If we go this route, there should definitely be discussion as to what we use to implement it.

@Ohda if there is a C# implementation we could consider it, can you say anything more in favor of it (particularly how it differs from JSON) the only examples I can find of it look less readable than JSON, with the obvious indicators of structure being the indentation, which unless that is strictly how it is defined have the potential to be lacking from some developer not paying attention, or just having formatting errors, whereas JSON has things bracketed which seems more obvious to me, otherwise they look basically the same (To me, as someone wholly unfamiliar with it)

vogonistic commented 7 years ago

@Ohda YAML could be an option, but I'm just more used to JSON personally.

@Grenkin1988 Let me look it up and give you an example of what it could be. For SimpleJSON vs Json.NET, I just have't looked at Json.NET yet. I think it's more interesting to have this discussion before we pick a parse.

TomMalbran commented 7 years ago

That looks good, but there are a couple of issues with the actions part. Right now the "OnUpdate" actions is different than as "InsEnterable" action and it is also different than a "ContextMenuAction".

  1. Actions like "IsEnterable" and "GetSpriteName" return a value, so we really just need 1 of each, it wouldn't make much sense to have 2, since the return value of the second action would overwrite the one of the first one.
  2. Actions like "OnUpdate" are saved into an "EventAction" instance, and there can be many with actions with that name.
  3. Actions like "ContextMenuActions" are saved as instanced of "ContextMenuLuaAction" into a list, and there can be many of them.

The JSON could look like:

{
  "ObjectType": "door_airlock",
  "Name": "Airlock Door",
  "Description": "An airlock door prevents air from leaving the base",
  "TypeTags": [ "Door" ],
  "MovementCost": 1,
  "PathfindingModifier": 3,
  "PathfindingWeight": 1,
  "Width": 1,
  "Height": 1,
  "LinksToNeighbours": false,
  "EnclosesRooms": true,
  "CanReplaceTypeTags": [ "Wall", "Door" ],

  "BuildingJobTime": 5,
  "BuildingJobRequirements": {
    "plate_steel": 10,
  },

  "DeconstructJobTime": 1,
  "DeconstructJobProvides": {
    "plate_steel": 7
  },

  "Params": {
    "openness": 0,
    "is_opening": 0,
    "thermal_diffusivity": 0.00001
  },

  "JobSpotOffset": {
    "X": 1,
    "Y": 0
  },
  "JobSpawnSpotOffset": {
    "X": 0,
    "Y": 0
  },
  "PowerConnection": {
    "InputRate": 0,
    "OutputRate": 3,
    "Capacity": 0
  },

  "IsEnterable": "IsEnterable_Door",
  "GetSpriteName": "GetSpriteName_Airlock",
  "EventActions": {
    "OnUpdate": [ "OnUpdate_AirlockDoor" ],
    "OnInstall": [ "OnInstall_AirlockDoor" ]
  },
  "ContextMenuActions": [
    {
      "FunctionName": "LandingPad_Test_ContextMenuAction_1",
      "Text": "Lua Test Deconstruct 1",
       "RequiereCharacterSelected": false
    },
    {
      "FunctionName": "LandingPad_Test_ContextMenuAction_2",
      "Text": "Lua Test Deconstruct 2",
       "RequiereCharacterSelected": true
    }
  ]
}
vogonistic commented 7 years ago

@Grenkin1988 I would probably do something like this:

"PowerConnection": {
    "inputRate": 0,
    "outputRate": 5,
    "capacity":0
}
Ohda commented 7 years ago

Same in YAML just to show you. It's more friendly to read and write for human than JSON. Even if we can easily make mistake with indention.

---
ObjectType: door_airlock
Name: Airlock Door
Description: An airlock door prevents air from leaving the base
TypeTags:
- Door
MovementCost: 1
PathfindingModifier: 3
PathfindingWeight: 1
Width: 1
Height: 1
LinksToNeighbours: false
EnclosesRooms: true
CanReplaceTypeTags:
- Wall
- Door
BuildingJobTime: 5
BuildingJobRequirements:
  Steel Plate: 10
DeconstructJobTime: 1
DeconstructJobProvides:
  Steel Plate: 7
Params:
  openness: 0
  is_opening: 0
  thermal_diffusivity: 1.0e-05
Actions:
  OnUpdate: OnUpdate_AirlockDoor
  IsEnterable: IsEnterable_Door
  GetSpriteName: GetSpriteName_Airlock
  ContextMenuActions: ContextMenuActions_AirlockDoor
vogonistic commented 7 years ago

@TomMalbran The format itself is probably a little bit off topic for this issue so if we have a lot to discuss, we might want to move it elsewhere.

The reason I mashed IsEnterable with the event actions might be wrong, but I was thinking that we will have to take each of those values and see if there is such an event for them to subscribe to and then add them to the list. For IsEnterable, GetSpriteName and ContextMenuActions, we'd use it as a lookup to see if this furniture has a custom implementation of those methods. We could split it into two, but I don't think the fact that they have different return values changes anything. What we get is just a string that gets executed in the lua context and we need intimate knowledge of the expected return value on each of them.

I tried to come up with any good reason why we should be able to register twice for the same event in the same furniture, and I can't think of one.

The ContextMenuActions function should return an array with values for the menu. It'd be the same but take state into account.

vogonistic commented 7 years ago

This:

  "BuildingJobTime": 5,
  "BuildingJobRequirements": {
    "plate_steel": 10,
  },

Should probably be:

  "BuildingJob": {
    "time": 5,
    "requirements": {
      "plate_steel": 10,
    },
  },
TomMalbran commented 7 years ago

@vogonistic I agree that the format can be discussed in an another topic, but just a simple reply, one reason for having more than 1 OnUpdate action might be for modability, if we allow the option to update the Furniture prototype instead of replacing it. I agree that inside a mod you only need 1 action, but if someone wants to create an updated version of any furniture, without replacing it, they could add a second "OnUpdate" action, and the prototype can save both. In any case, we wouldn't need an array of functions. The EventActions could be changed to allow functions with a return value. If we start having more of them, then that will work, with only 2, it might not be required. The Context menu actions need to be in a separate object, as those are different from the event actions.

The "BuildingJob" looks better like that, and the "DeconstructJob" can look similar.

jamzam90 commented 7 years ago

There is also LitJson which is what I've used in my other projects.

koosemose commented 7 years ago

Let's try to focus on data language choice first, we can look at options for parsers after we come to a decision. Though so far it seems to be leaning towards JSON, with one in favor of xml and one for Yaml, and everyone else (who has stated an opinion so far) seeming to favor JSON.

I will note, that one thing against JSON is lack of comments, which I feel are important for config files, but at least some parsers ignore that and allow them anyways.

Dormanil commented 7 years ago

I'd say JSON as well, but what I'd like to see even more is proper schema files so that we can easily validate and autocomplete them (because IntelliSense and similar things in other IDEs are lovely).

Grenkin1988 commented 7 years ago

@koosemose I actually like XML. But I do not mind to try something new :-) @Dormanil +100500 Already several times wanted to create schema for Furniture

koosemose commented 7 years ago

I wasn't actually counting you in any camp, but I also didn't give the specific number I was counting as in favor of JSON, so it wasn't clear who I was and wasn't counting.

I just want to make sure we get people's explicit opinions before anything is decided.

guntherwullaert commented 7 years ago

@koosemoose I think we should stick to xml and just change the parser. But Json does improve readability

alexanderfast commented 7 years ago

Long thread, didn't read through it, but I prefer json.

Tranberry commented 7 years ago

I prefer yaml.. It's less scary looking for new people even with the indent errors that will occur. But that is the only reason. Now someone who is not mainly a coder have commented 🐌

powli commented 7 years ago

I guess the data format language is personal taste, xml can be pretty readable too, when formatted/structured properly. It's just easier to do it with JSON. What is more important is to have a schema validation for the data files.

Raketfart commented 7 years ago

If someone loves JSON enough to rewrite all the files and writexml/loadxml code, then it's fine by me. I think JSON is easier to read, but I don't have a problem with keeping XML.

Tritaris commented 7 years ago

Personally I have no preference as I'm new to both working with XML and JSON. Either way I get to learn something new which is what I want. I have recently been cutting my teeth on a SpriteToXML tool for the art guys. If we decide to switch over to JSON I'll be up for it.

koosemose commented 7 years ago

There are also tools available to convert (I don't know about the quality of them as I only noticed in passing).

And it's not so much a love for JSON, as it is a dislike of xml for me at least.

Hooch0 commented 7 years ago

I vote for Json :)

dusho commented 7 years ago

I wonder why is there read/write xml code at all (is it because of backwards compatibility of files). Why not to create just pure classes with [Serializable]

[Serializable]
    public class Configuration
    {
           public List<Table> Tables { get; set; }
           public string OutputFolder { get; set; }
        }

and then just have common deserialization code (also make it generic):

public static Configuration Deserialize(string filename)
        {
            XmlSerializer serializer = new XmlSerializer(typeof(Configuration));
            // Declare an object variable of the type to be deserialized.
            Configuration i;
            using (Stream reader = new FileStream(filename, FileMode.Open, FileAccess.Read))
            {
                // Call the Deserialize method to restore the object's state.
                i = (Configuration)serializer.Deserialize(reader);          
            }
            return i;
        }

later on you can add Json serialization or Protobuf or w/e.. Even have all at once..

sboigelot commented 7 years ago

Json is cool, but soo dangerous for file edition.

What about https://hjson.org/ ? there is a c# github implementation of the parser

(I'm thinking mostly about moders in this comment)

mikejbrown commented 7 years ago

<Sentence mood="sarcastic"><SpeakerDisposition bodyStance="ducking"/><Content>What could possibly be wrong with XML?</Content></Sentence>

I have no real skin in the game, but JSON and YAML are both fine for me and preferred to XML. Whatever is the decision we need modder accessible documentation of the file formats and eventually some kind of automatic validation, at least for the main game files.

I don't like XML, but if you want a data format that is a masochistic challenge for new programmers you could do worse than it. Why don't I see XLS in the discussion? :wink:

@sboigelot Why is JSON so dangerous for modders? I'm honestly curious because I haven't worked with it much.

Raketfart commented 7 years ago

I think all data formats are "dangerous", as they tolerate no errors. This is how it must be, and hopefully we can make parsing errors very loud and clear to modders. When encountering parsing errors we could link modders to an online xml/json validation tool. I really don't think we do anyone any favors by using less common formats as yaml and hjson.

Grenkin1988 commented 7 years ago

@dusho I think that's because we have different read and write functionality. One is for prototype read. Other is for save/load.

dusho commented 7 years ago

@Grenkin1988 ok, I see. Still shouldn't data that need to be persisted (or loaded) have just their own objects (and don't care how they are represented). This way objects can be either dumped as XML, JSON or streamed through network. Also this way it will be cleanly separated - game logic from load/save logic.

mikejbrown commented 7 years ago

@dusho Normally I would agree with that but I'm not sure what the extra level of abstraction would actually buy us in a game like this. One day we'll suddenly decide to change save formats because... reasons? As long as the format is open and text editor-able I don't see how that's really likely.

sboigelot commented 7 years ago

@mikejbrown @Raketfart hjson tolerate a lot of error. It is meant for human to read and write. It won't care if you forgot a ; or used a ' instead of a "

dusho commented 7 years ago

@mikejbrown fair enough. It's just a bit messy seeing in code where game logic is also methods ReadXml and WriteXml. I would then vote for keeping .xml. I think .json is really something for web and data transfer mostly, not for hand editing and stuff..

NogginBops commented 7 years ago

I like the look and the readability of YAML. XML is a lot of bloat and JSON just feels messy/dirty (All those double quotes!).

akomm commented 7 years ago

I know, it is not on the list, but: what about BSON (bsonspec.org) ? It would take the readable advantage of JSON thought. It allows much more types, which you would maybe need to base64-encode in JSON. Maybe thing about the scope of data you will store to make a decision?

Implementations: http://bsonspec.org/implementations.html

The BSON Impl. from MongoDB is a separate package (https://www.nuget.org/packages/MongoDB.Bson/) and seem not to have any .NET deps.

mbstraus commented 7 years ago

There are a few reasons why using XML would be a good idea, although some tweaks should be made to the current XML files to make them more usable. I am not familiar with YAML, so I can't speak to that, but I think JSON is a little too free-form for this use case, particularly due to the distributed nature of the development on this.

The changes to the XML files that I think would improve things:

So although I personally like JSON, I don't think it is the right choice for a project like this. However, XML can be made readable and validatable. So that would be my recommendation.

sboigelot commented 7 years ago

@mbstraus about the marshalling

I said that in another issue, but I don't find it anymore. I know using xml reader is more "flexible" but really the best way to serialize/deserialize data from xml (or json, or whatever) is to use the .net Serializer:


public class XmlSerializer
{
    public void Serialize<T>(Stream stream, T value) where T : class
    {
        using (var writer = new StreamWriter(stream))
        {
            var serializer = new XmlSerializer(value.GetType());
            serializer.Serialize(writer, value);
            writer.Flush();
        }
    }

    public void Serialize<T>(StreamWriter writer, T value) where T : class
    {
        var serializer = new XmlSerializer(value.GetType());
        serializer.Serialize(writer, value);
        writer.Flush();
    }

    public T DeSerialize<T>(Stream stream) where T : class
    {
        var serializer = new XmlSerializer(typeof (T));
        return serializer.Deserialize(stream) as T;
    }
}

Then we just need to respect the class structure in the xml file. This is all the code it needs.

Reading prototype (ie: furnitures) becomes as easy as

FurniturePrototypes = XmlSerializer.Deserialize<List<Prototype<Furniture>>>(myFileStream).ToDictionary(f=>f.ObjectType,f=>f)

mbstraus commented 7 years ago

If desired, I could possibly look into doing a quick prototype of using XmlSerializer... haven't messed with it so I am a little curious on how it works. Maybe convert over the furniture stuff to it or something like that.

sboigelot commented 7 years ago

@mbstraus I would love to see that :)

koosemose commented 7 years ago

@mbstraus I wouldn't complain about that. I would still prefer JSON, but if we stay with xml I'd much rather something better than xmlreader... I kind of hate xmlReader

dusho commented 7 years ago

already started using XmlSerializer and I think it's much readable and flexible with changes.. also later on, other tags can be added and object serialized into anything you want.

[Serializable]
public class FactoryInfo
{
    [Serializable]
    public class Item
    {
        [XmlAttribute("objectType")]
        public string ObjectType { get; set; }
        [XmlAttribute("amount")]
        public int Amount { get; set; }
    }

    [Serializable]
    public class ProductionChain
    {
        [XmlAttribute("name")]
        public string Name { get; set; }
        [XmlAttribute("processingTime")]
        public float ProcessingTime { get; set; }

        public List<Item> Input { get; set; }
        public List<Item> Output { get; set; }
    }

    [Serializable]
    public class IntVector2
    {
        [XmlAttribute("x")]
        public int X { get; set; }
        [XmlAttribute("y")]
        public int Y { get; set; }

        public IntVector2()
        { }

        public IntVector2(int x, int y)
        {
            X = x;
            Y = y;
        }

        public Vector2 AsVector2()
        {
            return new Vector2(X, Y);
        }
    }

    public List<ProductionChain> PossibleProductions { get; set; }

    [XmlElement(ElementName = "InputSlotPosition")]
    public List<IntVector2> InputSlots { get; set; }
    [XmlElement(ElementName = "OutputSlotPosition")]
    public List<IntVector2> OutputSlots { get; set; }
}

usage:

case "FactoryInfo":
                    // deserialize FActoryINfo into factoryData
                    XmlSerializer serializer = new XmlSerializer(typeof(FactoryInfo));
                    factoryData = (FactoryInfo)serializer.Deserialize(reader);

XML looks like this:

<FactoryInfo>
  <PossibleProductions>
    <ProductionChain name="Iron smelting" processingTime="1">
      <Input>
        <Item objectType="Raw Iron" amount="3" />
      </Input>
      <Output>
        <Item objectType="Steel Plate" amount="3" />
      </Output>
    </ProductionChain>
  </PossibleProductions>
  <InputSlotPosition x="1" y="1" />
  <OutputSlotPosition x="2" y="1" />
</FactoryInfo>
sboigelot commented 7 years ago

@dusho

I think we should pluralize collection identifier.

InputSlotPositions instead of InputSlotPosition etc..

Also, you don't need to define the [XmlAttribute("...")] all public property define it implicitly. the model would be much cleaner without them

dusho commented 7 years ago

in code it's pluralized.. in XML it is not because there is no encapsulating xml tag for those collections and 90% of factories will have just 1 input and 1 output slot XmlAttribute you actually have to specify explicitly , because implicitly it is XmlElement. And I also wanted to make attributes in xml non-capitalized as it seems like common guideline (looking at Furniture.xml)

sboigelot commented 7 years ago

@dusho, there is no convention yet for xml, see #664 I think we should stick with the default implementation of the .net serializer for capitalization if we go that route