More Revision-Control-friendly file format

otoomet commented 4 years ago

Steps to reproduce

create a new file
add a few simple objects
save it as .xmap file

Actual behaviour

The file will be large, and apparently includes undo information, zoom level, view and such. Could not find a description of the file format.

Expected behaviour

Option to save only strictly map data (and preserve the order and formatting of objects saved). The undo/view information will interfere with RC systems, such as git and create a lot of spurious differences even when the actual map is not changed.

I would like to keep some of the maps under RC and keep the diffs as simple as possible. Ideally there will also be a some sort of graphical diff-merge tool (see #741) on top of the text-diffs.

Configuration

Mapper Version: 0.8.4 Operating System: ubuntu 16

dg0yt commented 4 years ago

The unstable version already creates additional linebreaks in the regular, compact .omap files, for better use in version control.
The 'non-map' data is another story. At least undo history can be cleaned manually before saving for version control, via Edit > Clear undo/redo history. This should mitigate the most serious sub-issue.
Side effects of color or symbol changes are probably the biggest show-stopper for this type of version control at the moment.

dg0yt commented 4 years ago

FTR the request for XML format documentation is tracked in #182.

otoomet commented 4 years ago

Thanks.

Side effects of color or symbol changes are probably the biggest show-stopper for this type of version control at the moment.

I am not sure I understand what this means...

As a side note, #1097 asks pretty much the opposite.

dg0yt commented 4 years ago

When you add/remove a color, it will trigger changes in a number of symbols. When you add/remove a symbol, or changes symbol order, it will trigger changes in a number of objects.

I don't see a conflict with #1097, because it will affect objects only when they are modified anyway. It addresses revision control at an object level, while git and Co. can only do revision control on a file level. The question is how to implement #1097 without adding too much extra data.

jmacura commented 4 years ago

For the record: I was trying to write a conversion script from .omap to SVG, but found the file structure not really friendly to 3rd party tools. It contains the "noise" (undos & redos, map view), weirdly named elements ("barrier") and, despite being verbose in general, the coordinates itself are not atomic and hard to parse...

dg0yt commented 4 years ago

@jmacura If you want to process the XML, you should better use the .xmap variant, due to its verbosity. The .omap variant is optimized for fast saving and loading, and for small size. That is the reason why the coordinates are not explicit in this variant. Note that SVG uses a similar approach.

The barrier elements are used when a file format change would cause serious trouble (crash, misrepresentation) with slighly older versions of the software. This is easy to handle when reading an XML stream, but could be more difficult with an XML tree.

"Noise" isn't a fair assessment outside of the revision control perspective, at least as long as users expect to continue their work were they left it.

otoomet commented 4 years ago

I think this kind of metadata (like open files, views, zoom settings) is typically stored in a separate file by IDE-s and such. Not sure if this might apply to Mapper.

dg0yt commented 4 years ago

The barrier elements are used when a file format change would cause serious trouble (crash, misrepresentation) with slighly older versions of the software. This is easy to handle when reading an XML stream, but could be more difficult with an XML tree.

At the moment, the barrier element wraps elements which cause problems in older versions of Mapper. I would propose to modify this is as follows:

Add an optional attribute action to <barrier>.
If the action attribute has the value skip, readers shall skip the following element if they don't meet the version requirements expressed in the <barrier>.
If there are multiple elements to be skipped, now wrapped by a single common <barrier>, each element gets an individual preceding <barrier action="skip">.

Once implemented in Mapper, this would be enough to safeguard current versions against misbehaviour with files written by future versions. Other software could handle it as needed, including just ignoring it.

Implementing this change for map reading now, it may become the standard for writing in a later version.

dg0yt commented 4 years ago

I think this kind of metadata (like open files, views, zoom settings) is typically stored in a separate file by IDE-s and such. Not sure if this might apply to Mapper.

My first thought was that Mapper is more like a document editor (e.g. Word, PowerPoint), less like an IDE. (How many map makers do know what an IDE is?) However, when looking at the collection of templates used with a map, the perspective changes. Still, I'm not convinced that yet another file is really desirable.

dg0yt commented 4 years ago

The file will be large, and apparently includes undo information, zoom level, view and such. Could not find a description of the file format.

Saving undo information can be turned off in the settings.

aberlol commented 3 years ago

Saving the same map file (without changes) in OOM on Linux and Android (same version number 0.9.4) does not produce exactly the same .omap file content.

It seems as some object/tags-child nodes are swapping places. This adds unnecessary changes in the commit history. It would be nice to not re-ordering the tags.
The template path-attribute is changed, even though the relpath is the same. Is (the absolute) path really necessary to save when we have a working relative path?

dg0yt commented 3 years ago

It seems as some object/tags-child nodes are swapping places. This adds unnecessary changes in the commit history. It would be nice to not re-ordering the tags.

Indeed. This is a QHash structure, and we don't care enough about the order of the keys.

The template path-attribute is changed, even though the relpath is the same. Is (the absolute) path really necessary to save when we have a working relative path?

Unsure. You are looking at Linux and Android which do have a common root directory /. But on Windows, there are drive letters e.g. C:. If you just save the map on a different drive, there is no relative path. OTOH, I already try to avoid absolute paths in test maps, example maps and symbol sets. The absolute directory layout may give hints about operating system, user name, or map project which might not be meant to be disclosed in some cases.

aberlol commented 3 years ago

It seems as some object/tags-child nodes are swapping places. This adds unnecessary changes in the commit history. It would be nice to not re-ordering the tags.

Indeed. This is a QHash structure, and we don't care enough about the order of the keys.

I see! Then a quick fix would be to order the tags alphabetically every time you save. I understand that the order is un-important to the application... but it would really improve the revision-controlability at a low cost.

(In my case I just removed all tags with regex from the file since I didn't really need them anyway.)

The template path-attribute is changed, even though the relpath is the same. Is (the absolute) path really necessary to save when we have a working relative path?

Unsure. You are looking at Linux and Android which do have a common root directory /. But on Windows, there are drive letters e.g. C:. If you just save the map on a different drive, there is no relative path. OTOH, I already try to avoid absolute paths in test maps, example maps and symbol sets. The absolute directory layout may give hints about operating system, user name, or map project which might not be meant to be disclosed in some cases.

True, I see your point.

I like the idea posted earlier by @otoomet about creating a metadata-file that contains user specific information (even template settings). This would be a great solution in my opinion.

Addition: The metadata file could then either be gitignored or shared between users if current behavior is desired.

dg0yt commented 3 years ago

Then a quick fix would be to order the tags alphabetically every time you save.

Sorting everytime you save is too expensive. For now, I switched away from QHash to a flat list, assuming that a) the number of items is low, and b) searching for particular tags happens only occassionally (e.g. on import with symbol assignment).

The new data structure is unit-tested, but I would appreciate if it could be complemented by some real-world testing of yesterday's master release.

FelixFrog commented 3 years ago

Ok so I've done some testing with git, branching, merging. There are two major things that are holding this back:

The count attribute creates lots of merge conflicts. It isn't a problem if you only commit on the same branch, but as soon as you switch and, for example, you add a node to an object, there are merge conflicts to solve that are just a count="35" that conflicts with a count="36". So are they absolutely necessarry? I think that as of now it's not that hard to teach OOM to read files that do not contain those attrributes.
Colors are a pain, since if you change the order of a color all of the priority attributes have to change. A quick solution would be to be able to set priorities/z-index in a range from 0 to 65535 in the GUI (as most softwares do) which would also give more flexibility. Another solution would be to add an id attribute to them, and, for example, when a color is deleted, make it appear as missing in the color menu so that the user can replace it easily. This would also be a quality of life improvement.

As a long term solution, I think that the idea of a multi-file format is not that bad. One such idea would be to store everything in a folder

map-folder
├── colors.xml
├── index.json
├── metadata.json
├── parts
│   ├── part1.xml
│   └── part2.xml
├── symbols.xml
└── templates
    ├── template1.jpg
    └── template2.png

(Qt has good json support and in our case it would drastically reduce the size as well as improve human readibility Qt docs)

We could then gzip/bzip the folder into something like "Mymap.map" and teach OOM to open both the compressed folder and the uncompressed one from index.json, so that it still remains one single file for beginners and it would also reduce size (team could just store the uncompressed folders). This surely opens lots of possibilities, such as storying templates for teamwork and it would be an amizing tool for teams and not just using git. This is nothing more than an idea but I think it's worth considering, since it would could be an extremely powerful solution in a far away future.

dg0yt commented 3 years ago

The count attribute creates lots of merge conflicts. It isn't a problem if you only commit on the same branch, but as soon as you switch and, for example, you add a node to an object, there are merge conflicts to solve that are just a count="35" that conflicts with a count="36". So are they absolutely necessarry? I think that as of now it's not that hard to teach OOM to read files that do not contain those attrributes.

This attribute is not mandatory. It is used to pre-allocate memory in an informed way. With regard to versioning, this information might be moved to a sidecar file which is not under version control. If the file is missing or not consistent with the actual map file, there would be only a one-time performance impact.

Colors are a pain, since if you change the order of a color all of the priority attributes have to change.

It is not just colors. There are multiple spots where the current index in a list is used to refer to elements, instead of using a proper ID.

I'm not sure if it will be really feasible to leave merging forked work only to git and notepad.

I'm not willing to rush into a new/more complex format quickly. There are also other options to explore (for humans or revision control: YAML; binary: CBOR). However, these formats, including the current XML, seem to share the common idea of having a tree of objects, array, and values. So it should be feasible to explore this step by step.

In any case, we must be aware of the special case of loading and saving under memory/deadline pressure on Android. Updating few small files (only when modified) might even help, in contrast to updating a single big file.

dg0yt commented 3 years ago

json support

What I continue to trip over with JSON is its lacking tolerance for trailing comma. This is invalid:

{
  "A": "aa",
  "BB": "b",
}

Similar, when removing the whole BB line, the comma from the A line has to be removed, too. This is failing the goal of being more revision-control-friedly.

jmacura commented 3 years ago

json support

What I continue to trip over with JSON is its lacking tolerance for trailing comma.

That is a known limitation of JSON. One of multiple. Option is to use some superset of JSON, like JSON5 or YAML, which improves usability but notably reduces interoperability. Yet I have a strict feeling that a regular developer shall not bother with the syntax format specificities, as the parser/reader should handle it.

dg0yt commented 3 years ago

Yet I have a strict feeling that a regular developer shall not bother with the syntax format specificities, as the parser/reader should handle it.

However, the context of discussion is automagic merge in git which is not a JSON parser. That's why I want to record it here explicitly. (I mentioned YAML in https://github.com/OpenOrienteering/mapper/issues/1290#issuecomment-808711002.)

otoomet commented 3 years ago

Why do you want to have json format, @jmacura ? From git perspective, xml is good enough (given formatting does not change). I agree that json is somewhat more human readable but I am not sure it matters in case of larger projects (like a map).

jmacura commented 3 years ago

However, the context of discussion is automagic merge in git which is not a JSON parser. That's why I want to record it here explicitly.

Good point.

Why do you want to have json format, @jmacura ?

Actually, I am not the one who came up with JSON in this thread.

FelixFrog commented 3 years ago

Actually, I am not the one who came up with JSON in this thread.

Yeah I did mention JSON but it was no more than an idea:sweat_smile:. The reality is that the files in which we would store the actual map parts don't need to be super human readable, just git friendly (and possibily fast to parse/storage efficient). Ideally, they could just be a csv table with an object per line (with an id and the coords, kind of what is stored in a .omap file apart from the id) if it wasn't for other small things for which we need a little bit of structure. This would give best speed, file size, ease in parsing (no need for external libs) and git compatibility, but would require an external "data" file for little things that do not fit in the csv table (pattern tags, text, etc...). I am not proposing any definitive solution here, just giving out ideas.

I actually proposed JSON (and for the matter YAML/TOML/XML) for other small things such as map metadata (undos, redos), an index file, symbols, colors, etc..., but that is definitely less important and depends a lot on Qt libraries avaliability, ease in parsing, git compatibility and is certainly one of the last things to worry about (if anyone is interested this is a good resource to read).

In general at this point I think that the most sensible thing to do is to undersand wether creating a completely new format is worth the effort or if the current .omap/.xmap format is flexible enough to give good git compatibility.

dg0yt commented 3 years ago

Thanks for clarification. IMO There are two separate goals with regard to revision control:

Improve revision-control friendliness of the existing single-file format.
Study the feasibility of revision-control-friendly multi-file format.

For the latter, I would suggest to start with a good research of existing open GIS formats, instead of re-inventing the wheel again. But probably most GIS simply follow a more centralized approach (central database), while git et al. are successful due to there distributed, decentralized approach.

OpenOrienteering / mapper