TeamPorcupine / ProjectPorcupine

Project Porcupine: A Base-Building Game...in Space!
GNU General Public License v3.0
484 stars 279 forks source link

[Discussion] Switching our data format language. #1034

Closed koosemose closed 7 years ago

koosemose commented 8 years ago

PLEASE VOTE YOUR CHOICE IN STRAWPOLL Data Format Poll : http://www.strawpoll.me/11156965

Surprisingly, we've never had a discussion directly about if we want to continue using XML as our data format or switch to something else. We've only discussed it in passing ( in #25 and #21 ). Our primary other candidate would be JSON as Unity already has some degree of support for it and further support could be achieved through SimpleJSON.

We have added outside libraries in the past when we needed to, so I don't think that should be a primary concern. Quills only recently stated concern was that we use one language for it, and that someone would need to volunteer to make the needed changes, stated in #25 (I don't claim that this is the entirety of his opinion on the matter, only that it is the only semi-recent statement on the matter). I know I, for one, would be willing to assist on making the changeover, if that's what's decided, and I'm confident that if we choose to do it, others will help, so I also don't think that should be a primary concern.

With JSON (via SimpleJSON) we wouldn't need to loop over the data structure, which can be error prone, such as if you skip reading a line and try to readToNextSibling from the wrong depth, you instead just check if the data you want exists and use it to populate the objects you need populated.

If interest is shown in switching to JSON we can do further investigation into what exactly it will mean for our data loading and how it will work for us.

Thoughts and opinions for or against?

Edit: An informal count of what people's preferences seem to be as of right now. This is only to give people a general idea of how people are feeling, and there are several I couldn't discern their opinion if any on the matter.

JSON XML YAML
@koosemose @crafty-geek @NogginBops
@vogonistic @Grenkin1988 @mikejbrown
@Dormanil @gunthergun
@Mizipzor @sboigelot
@MANDAL0R3 @dusho
@TomMalbran @powli
@mikejbrown @kd7uiy
@crafty-geek
@Ohda
@mbstraus
@bjubes
kd7uiy commented 8 years ago

Slight suggestion. Can we keep this issue to deciding which language to use, and leave the implementation of how to use that language for a different issue?

mbstraus commented 8 years ago

Well, so as to avoid swirl on this further, I think someone leading the design decisions should probably step in at this point and make a decision... then start getting it applied.

kd7uiy commented 8 years ago

That brings a whole other question, as to who should be deciding this. It seems like there's a good variety of views, and I think the key thing is to pick one and stick to it. It's for cases like this that I think we need a "Lead Systems Architect". Perhaps @quill18 wants to weigh in?

koosemose commented 8 years ago

Currently my count of people's opinions are as follows, many did not explicitly state their opinion, and I did not want to guess, so if you're not included or I got your choice wrong, please reply with a simple and direct statement of your choice.

JSON XML YAML
@koosemose @mbstraus @Ohda
@vogonistic @Grenkin1988 @mikejbrown
@Dormanil @gunthergun @NogginBops
@Mizipzor @sboigelot
@MANDAL0R3 @dusho
@TomMalbran @powli
@mikejbrown @kd7uiy
@crafty-geek @crafty-geek
TomMalbran commented 8 years ago

I am in the JSON group. But if we stay with XML we need to change the read/write, is a pain to use the XmlReader.

dusho commented 8 years ago

guess I'm in the XML group, and slowly start using serializable objects that can be initially used with XmlSerializer, later on with any other serializer. For now I would keep old read/write (XmlReader) code as it is, but move it to partial class.. e.g. Furniture.IO.cs

powli commented 8 years ago

I'm also in the XML group. I'm also with @dusho on separating everything into an abstraction layer. But that probably is discussion stuff for another issue.

sboigelot commented 8 years ago

To make my opinion clear, I'm in the XML group + XmlSerializer for now.

(and if we use Json, let's use a Serializer as well, like Json.net or Newtonsoft, or, any, no manual parsing)

mikejbrown commented 8 years ago

@koosemose I said

JSON and YAML are both fine for me and preferred to XML

koosemose commented 8 years ago

Apparently I had copying errors in my process.

kd7uiy commented 8 years ago

I will vote XML, only because we already have quite a few things in XML, and it seems quite a bit of work for relatively little gain to change everything.

koosemose commented 8 years ago

The gain, for those of us who think JSON is a better choice is in readability, maintanence, and general usability.

Pretty much the exact same benefits as the whole StyleCop saturday thing, just with a smaller scope, only the data files and reading/writing code.

Tranberry commented 8 years ago

@mikejbrown doesn't that imply that you prefer any of the alternatives too XML?

But I'll like to revoke my vote, as I don't think I'll have any problem working with any of these alternatives and I want people who actually have experience to be more influential in this (I've dabbled with all mentioned languages but not more than that).

mikejbrown commented 8 years ago

@Tranberry yeah pretty much

Grenkin1988 commented 8 years ago

@TomMalbran @vogonistic How this:

{
  "ObjectType": "door_airlock",
  "Name": "Airlock Door",
  "Description": "An airlock door prevents air from leaving the base",
  "TypeTags": [ "Door" ],
  "MovementCost": 1,
  "PathfindingModifier": 3,
  "PathfindingWeight": 1,
  "Width": 1,
  "Height": 1,
  "LinksToNeighbours": false,
  "EnclosesRooms": true,
  "CanReplaceTypeTags": [ "Wall", "Door" ],

  "BuildingJobTime": 5,
  "BuildingJobRequirements": {
    "plate_steel": 10,
  },

  "DeconstructJobTime": 1,
  "DeconstructJobProvides": {
    "plate_steel": 7
  },

  "Params": {
    "openness": 0,
    "is_opening": 0,
    "thermal_diffusivity": 0.00001
  },

  "JobSpotOffset": {
    "X": 1,
    "Y": 0
  },
  "JobSpawnSpotOffset": {
    "X": 0,
    "Y": 0
  },
  "PowerConnection": {
    "InputRate": 0,
    "OutputRate": 3,
    "Capacity": 0
  },

  "IsEnterable": "IsEnterable_Door",
  "GetSpriteName": "GetSpriteName_Airlock",
  "EventActions": {
    "OnUpdate": [ "OnUpdate_AirlockDoor" ],
    "OnInstall": [ "OnInstall_AirlockDoor" ]
  },
  "ContextMenuActions": [
    {
      "FunctionName": "LandingPad_Test_ContextMenuAction_1",
      "Text": "Lua Test Deconstruct 1",
       "RequiereCharacterSelected": false
    },
    {
      "FunctionName": "LandingPad_Test_ContextMenuAction_2",
      "Text": "Lua Test Deconstruct 2",
       "RequiereCharacterSelected": true
    }
  ]
}

is more readable than this:

<?xml version="1.0" encoding="utf-8" ?>
<Furniture ObjectType="door_airlock" Name="Airlock Door" Width="1" Height="1">
  <Description>An airlock door prevents air from leaving the base</Description>
  <TypeTags>
    <TypeTag type="Door"/>
  </TypeTags>
  <Movement Cost="1" PathfindingModifier="3" PathfindingWeight="1"/>
  <CanReplaceTypeTags>
    <TypeTag type="Wall"/>
    <TypeTag type="Door"/>
  </CanReplaceTypeTags>
  <Construction LinksToNeighbours="false" EnclosesRooms="true">
    <BuildingJob Time="5">
      <Requirement item="plate_steel" amount ="10"/>
    </BuildingJob>
    <DeconstructJob Time="1">
      <Requirement item="plate_steel" amount ="7"/>
    </DeconstructJob>
  </Construction>
  <PowerConnection inputRate="0" outputRate="5" capacity="0"/>
  <Params>
    <Param name="openness" value="0"/>
    <Param name="is_opening" value="0"/>
    <Param name="thermal_diffusivity" value="00001"/>
  </Params>
  <JobSpotOffset X="1" Y="0"/>
  <JobSpawnSpotOffset X="0" Y="0"/>
  <LuaFunctions>
    <Function key="IsEnterable" name="IsEnterable_Door"/>
    <Function key="GetSpriteName" name="GetSpriteName_Airlock"/>
    <Function key="OnUpdate" name="OnUpdate_AirlockDoor"/>
    <Function key="OnInstall" name="OnInstall_AirlockDoor"/>
  </LuaFunctions>
  <ContextMenuActions>
    <ContextMenuAction text="Lua Test Deconstruct 1" function="LandingPad_Test_ContextMenuAction_1" RequiereCharacterSelected="false"/>
    <ContextMenuAction text="Lua Test Deconstruct 2" function="LandingPad_Test_ContextMenuAction_2" RequiereCharacterSelected="true"/>
  </ContextMenuActions>
</Furniture>

?

koosemose commented 8 years ago

Because the data is more separated from the semantics.

Dormanil commented 8 years ago

Just because a file is shorter doesn't mean it is better structured.

Grenkin1988 commented 8 years ago

How? "ObjectType": _"doorairlock", <Furniture ObjectType= _"doorairlock"/> semantics - data

dusho commented 8 years ago

I feel that JSON gets messy if there are many deeper levels.. then you get lost in brackets unless you have some smart text editor... XML is readable most of the time.. and extra start/end tags make it clearer also.. for XML we can generate .xsd from .cs classes (xsd.exe) and give modders to prevent spell errors and stuff when having larger objects Haven't seen anything like that for JSON.. is there something as well?

koosemose commented 8 years ago

Pretty much any non-self closing tag.

Dormanil commented 8 years ago

JSON does in fact support schema and there are tools to create them from .NET/C#. I may refer to newtonsoft's Json.NET which includes schema validation as well.

And you really think that giant one-liners like <ContextMenuAction text="Lua Test Deconstruct 2" function="LandingPad_Test_ContextMenuAction_2" RequireCharacterSelected="true"/> are more readable than JSON's way that automatically treats those attributes in a way that allows new lines more easily? And I honestly rather have brackets than obnoxiously long closing tags which add no value whatsoever from a machine standpoint and are just extra bloat.

mbstraus commented 8 years ago

@dusho I was gonna bring up XSDs as well.

@Dormanil Wasn't aware of that library, seems pretty nice (haven't really done much in C# or .NET). However, it looks like their schema validator isn't free (the free version has limited uses per hour, not sure what that restriction really means). But I would think that we would want to build in the schema validation into the code as a fail-safe when reading the file... so we should be able to run validation against a schema at will. The serializing seems pretty nice at first glance, though.

Dormanil commented 8 years ago

@mbstraus, their open-source JSON serializer includes a schema validator, it is just not as fast as the new one they released.

sboigelot commented 8 years ago

From experience, I would say that it's not a good idea to make a large change if the large majority is not following it.

We could support both too, I mean, we just need 2 Serializer, one for xml, one for json. And we will see with time what is the prefer format. It will be easy enougth to de-serialize from one data format and serialize the data back to another one if we want to switch later.

TomMalbran commented 8 years ago

@Grenkin1988 Because when I see the JSON file I can easily find all the data and the values. When I see the XML I have to read each line to find it because of the use of attributes being all in the same line as the tag. Each line ends up with too much content, being all the attributes or having a value in between an opening and closing tag without spaces. You can read it fast when you add color to tags, attributes and values. JSON doesn't need colors to be readable.

XML without attributes and the need of closing tags is the same as JSON, and since you don't need to close the tags, you can find the value really fast (is always at the end of the line), you don't need to find it in between the opening and closing tag. You also can tell if the value is a number, a string or a boolean. And with proper indentation, you know where you are in the structure.

vogonistic commented 8 years ago

Several of you are talking about automated serialization in xml. How do you control with fields are serialized and how are properties with implemented getters and setters handled? I've yet to see a system like that which works well on objects like ours. What do we do with saves if we decide to refactor the classes?

XML end tags are nice if the start tag is far away, but a vast majority of the time it's on the same line. Readability suffers even more for simple data structures like arrays where the repeated tags are usually taking up more space than the data.

What we are doing is just storing structured data, but XML suffers because that's not the primary use case. JSON and YAML are made to represent common data structures in file and do it well.

Grenkin1988 commented 8 years ago

@vogonistic For properties with logic you just need to store their back field and ignore property.

For me issue with automated serialization a is how to split prototype/save logic

crafty-geek commented 8 years ago

Format: Fine with XML or JSON. JSON is certainly more readable, but human readability is not a critical concern for a data format. Parsing/writing: Revamp to use auto-serialization/deserialization whereever feasible. Why reinvent the wheel, as they say? Human readability/writability concerns: endeavor to build a standalone translator/verifier tool, (for sake of discussion, assume XML is the default format) where you could develop in JSON, and auto-port to XML (and catch glitches without having to run the main game); perhaps plan to expand this into a furniture-development modding IDE - imagine being able to write a mod by just creating furniture and item specifications in VERY human-readable dialogs (complete with artwork displays, sound management, etc) and only having to touch code for the end products of your production chains.

Wanna develop an uber-futuristic tracking laser turret that uses rare elements? Define your rare elements as BaseMaterials (with graphics, spawn conditions/frequencies/variabilities...), define crafting recipes that use preexisting or modder-defined factories to create a variety of IntermediateMaterials (items that aren't spawned, and only have a function as a crafting component - Minecraft's stick, for example), Furniture, and Items (non-furniture that nonetheless have functions other than crafting components - eg the actual weapon part of your laser turret can be a character-wielded weapon as well as a crafting component), until you can support the end crafting recipe for your gun turret; the only code you then need to write yourself is the gun turret targeting logic/animation, and hit/dmg logic and animations.

koosemose commented 8 years ago

@crafty-geek I would say readability is somewhat more important as it is intended for the data files to also be able to be used for modding.

dusho commented 8 years ago

Somehow I think people have problem with how read/write is implemented currently (using XmlReader) than format itself. Having classes that purely just hold the data is a normal and good way have things.. especially if you decide later on to add multiplayer and start sending those objects serialized as something network friendly. I honestly don't mind even having Json.. but I don't want to write loader and saver.. I want construct object and serialize it (e.g. in UT) to see how it looks like and to use it as a template to populate real file and just adjust the params. @vogonistic you can check my WIP commit to see XmlSerializer at work.. Also with UT to generate .xml

Ohda commented 8 years ago

I vote for JSON. I'am trying to find the majority here so that topic can end :)

mbstraus commented 8 years ago

I will change my vote to JSON as well, assuming we also pull in a library for properly serializing / deserializing JSON. Using SimpleJSON will not suffice... we would need to bring in a JSON serializing package (which SimpleJSON is not) and a JSON schema validator.

sboigelot commented 8 years ago

I don't mind JSON neither as soon as we use a good serializing method.

bjubes commented 8 years ago

would it be possible to switch all furniture and data objects defined in streaming assets to JSON while keeping the save and load in XML? I think when you have a stack of tiles in a file XML does a better and clear job of showing a hierarchy, but when it comes to defining an object with many properties, JSON thrives.

quill18 commented 8 years ago

JSON is probably my personal favorite data format to work in -- though a lot of that is because I used to work with it as part of web application development, and it was very natural there.

XML had the advantage of the built-in, automated serialization -- but there's no reason we can't make our own Serialization interface that works with JSON.

I have zero objection to a switch over if people are really in favour of it and someone is willing to put in the work.

fre-ber commented 8 years ago

I don't really mind what format will be chosen, but I think that the lack of comments in json is a huge draw-back. Writing comments in the file makes it a lot easier to read regardless of the format for the data and is espescially useful in a modding context. I suppose that we could define a special data element named "_comment" or something that the parser will ignore, but that feels a bit clunky and possibly confusing to new modders.

koosemose commented 8 years ago

@fre-ber One of the JSON implementations those of us who prefer JSON are investigating is JSON.net which does allow comments (despite not technically being proper JSON)

fre-ber commented 8 years ago

@koosemose Right, that would be fine then, I guess.

Found this:

Json.NET only supports reading multi-line JavaScript comments, i.e. /* commment */ Update: Json.NET 6.0 supports single line comments

and that looks good to me - assuming that it is explained somewhere in the documentation that we aren't using proper json that can't be parsed by any json parser without special treatment. Some people might want to run our files through other tools during production, but that might fail or at best strip all the comments away and this would be bad.

dusho commented 8 years ago

Ah ye.. Forgot about Json and comments.. If for some reason this Json.net 6 can't be used with Unity and e.g. notepad++ can't handle json comments I would strongly recommend to keep current Xml and use XmlSerializer instead of parsing using XmlReader. Also this change should be done when 0.1 is reached only. I see people waiting with further coding until agreement here found..

vogonistic commented 8 years ago

How do we handle changes to the classes with XmlSerializer without breaking saves?

bjubes commented 8 years ago

can't we just change them all at once in a PR? it would make old saves incompatible but thats not really a worry at this point

kd7uiy commented 8 years ago

Okay, so it seems like it's desired to make this change. A few steps I would propose:

  1. Let's convert a single function to use Json, as a demonstration. Don't care what it is, but it should be pretty minor if possible.
  2. Let's divide out the major sections, including Save/Load, Furniture, and everything else, where 1 person is in charge of each of them. I think that'll make this go faster, while not creating chaos.
  3. We might want a separate thread to discuss how to make that transition, as this one is kind of long...
sboigelot commented 8 years ago

@vogonistic

How do we handle changes to the classes with XmlSerializer without breaking saves?

I'm not sure about JSON serializer but if it works like the XML serializer from .net:

vogonistic commented 8 years ago

@sboigelot That's the same as breaking all old saves, which is my concern for any auto-serializer.

Grenkin1988 commented 8 years ago

@vogonistic @sboigelot We would begin breaking anything just after first "release". Now this is just development installation where every day shit could happen :-)

sboigelot commented 8 years ago

It has been proposed to change everything to serializer but the save. We could keep the save/load feature as reader

fre-ber commented 8 years ago

@dusho Sorry, my quote was a bit out of context. As I understand it, json.Net 6.0 is only required for the single line comments, i.e. // This comment.

The regular /* block comments */ should be fine with earlier versions of json.Net.

dusho commented 8 years ago

@vogonistic so what is the current strategy to keep the ability to load old saves? Is there some versioning of save files? With XmlSerialization (or any automatic serialization) I remember seeing strategy that when introducing braking change into save file format, you will keep your old serialization class , name it respectively, e.g. Furniture.SaveLoad_v0_1, then your loader will detect version you want to load (there is version in save file - so v0_1). loader will use then older Furniture.SaveLoad_v0_1 deserialization class and SaveFilePorter will extract (port) relevant information from old deserialized object into current one.

kd7uiy commented 8 years ago

I say let's just make the change, and not care if saves are lost, for this one time. Let's face it, there isn't much to be gained to save at this point anyways. In the future, however, we need to have a set of regression tests, where we have several save files from each version that we ensure can be loaded with the newest save loader. But as early as we are now, I don't think it matters.

mikejbrown commented 8 years ago

How many players have a bunch of old saves that they really need to keep around? PP hasn't even reached the v0.1 milestone yet, for crying out loud! Players should expect frequent breaking changes!