Improve mechanics of merging mods and graphics packs

Currently, mods and graphics can be used together but there are some compatibility problems - specifically that mods which remove large chunks of the raws can confuse the diff functions and cause problems.

Some of this may be inevitable (without ~~the holy grail~~ full raw parsing), but with graphics as a special case it may also be possible to mitigate the issue with custom logic.

Instead of merging mods into graphics raws, how about merging mods into vanilla raws and finally applying <>?

Magic would happen to identify graphics-specific changes between the pack and baseline vanilla; then more magic would be used to find the locations on which to apply the change. Perhaps regexps? It might also be possible to store these instructions in .json form, making graphics packs even smaller than they are now.

[Issue created by PeridexisErrant: 2015-03-13] [Last updated on bitbucket: 2016-08-29]

[Comment created by PeridexisErrant: 2016-08-29] Technically yes, but maintaining this setup would be really tricky. We'd be better off supporting a switch to "Use Rubble for mods and graphics" (including the Rubble web UI) and then disabling the PyLNP functionality.

[Comment created by pinemach: 2015-06-22] I understand your rationale but I'm unwilling to compromise. PyDwarf is my personal project and I'm not interested in abandoning my standards and my reasoning for yours.

[Comment created by Pidgeot: 2015-06-22]

A Python 2.7 interpreter could be bundled with PyDwarf just the same if users can't be trusted to set up Python correctly.

The only way to integrate into the mod system would then be to write code which:

checks that PyDwarf is available to use (and it can launch)
filters out PyDwarf mods if it isn't
knows to launch PyDwarf correctly if a PyDwarf mod is added

You'd also need to document how to make it work in the readme. (None of this requires UI changes, by the way; a "full" integration wouldn't require it either - it'd only be a question of changing files in core.)

Alternatively, you could just act as a plain utility, but then there's no integration involved (because I don't want to write launcher windows for specific utilities, and I doubt you'd be interested in users having to type in the full set of parameters manually). That, however, would require some form of UI on your end (which might be preferable for people who don't use PyLNP).

This is PyDwarf's scripts directory, containing numerous examples.

I based my knowledge on what I'd previously seen in your forum thread. Looking through the specific examples in your repo, I can see a few patterns that would indeed need changing (most notably: xrange -> range; iteritems -> items, and a couple of print statements in stal/armouryentities.py), and a few more in the ones you wrote yourself (basestring no longer exists, callability needs to be checked differently) - but not really anything else, and that's basically what I expected. There is very little here that has any relation to the Python version, and for the few things that are there now, just running 2to3 on them fixes nearly everything to work on both (the basestring rewrite is the only potential issue I can see, since the default 2to3 fix will not allow unicode string literals in Python 2).

I would expect that the people who aren't proficient with Python are going to go by your examples, so making those 2/3-compatible has a decent chance of being sufficient - and at least with the current examples, none of them use anything that would make a Python 3-compatible version incompatible with Python 2 (so it's "just" a matter of writing Python 3 scripts).

[Comment created by pinemach: 2015-06-22] PyDwarf scripts are Python modules. If a script uses some syntax or package available in Python 3 but not 2, or vice versa, then there's trouble. From my understanding most of those who mod Dwarf Fortress aren't proficient with Python. It would be impractical to expect those modders to know how to write scripts compatible with both. A Python 2.7 interpreter could be bundled with PyDwarf just the same if users can't be trusted to set up Python correctly.

This is PyDwarf's scripts directory, containing numerous examples. I strongly recommend looking at the contents of the PyDwarf repo if you're trying to understand how it works. https://github.com/pineapplemachine/PyDwarf/tree/develop/scripts

[Comment created by Pidgeot: 2015-06-22] The Python 2/3 compatibility is a firm requirement, because users who run from source might only have one of them installed. Users who use the binaries don't need either one, so for them (which is the vast majority of PyLNP users), there might not be any interpreter other than the bundled one (which can not be repurposed for other scripts anyway).

Consequently, we can't really implement any solution which relies on PyDwarf as an external program - the best we could do is run the script within the active interpreter, but then you're back with the interpreter issue.

Having said that... are you sure the compatibility is really going to be a problem for mods? It was my understanding that they wouldn't really do a whole lot outside of calling into the PyDwarf code; they wouldn't really be doing anything that had compatibility issues. (I may be wrong, of course.)

[Comment created by pinemach: 2015-06-22] I'm not sure I agree on all those points. Most significant is the compatibility with both Python 2 and 3. The same would not be possible to guarantee of mods for PyDwarf, and regardless I believe it would be too much to insist that those who write PyDwarf scripts jump through all the hoops necessary for making their mod work for both versions. I can certainly make it so for PyDwarf itself, but that only goes so far.

I think the ideal is not importing PyDwarf as a module or making it an integrated part of PyLNP, but treating it as a standalone and executing it with arguments according to the configuration specified by the user. Really, apart from the interface, all that should need to be done is to include some release of PyDwarf somewhere in the repository then from PyLNP run python path/to/pydwarf/manager.py <arguments>. (And I think UI coding is the pits so I'll patiently wait for somebody else to want PyDwarf integration bad enough to do it themselves.) From the tutorial, here's a description of the arguments which the manager accepts.

-i or --input: Specifies raws input directory.
-o or --output: Specifies raws output directory.
-b or --backup: Specifies raws backup directory.
-ver or --version: Specifies Dwarf Fortress version.
-hdir or --dfhackdir: Specifies DFHack directory.
-hver or --dfhackver: Specifies DFHack version.
-s or --scripts: The list of scripts to run. (Only names and namespaces may be specified in this way, not dictionaries.)
-p or --packages: The list of Python packages to import.
-c or --config: Reads configuration from the json file given by the path. Can also refer to a Python file or package, which will be imported and used for configuration. See config_override.py for an example.
-v or --verbose: Sets the logging level for standard output to DEBUG. (By default, fully verbose logs are written to the logs/ directory regardless of this flag.)
--log: Specifies the log file path.
--list: Lists registered scripts in alphabetical order.
--meta: When given names of scripts as arguments, shows each script's metadata in a readable format. When given no arguments, metadata for all registered scripts is displayed.
--jscripts: More complicated alternative to --scripts which accepts a json array just like the scripts attribute in config.json.
-h or --help: Shows a summary of each argument's purpose.

The moral of the story being that PyLNP might do something like this for working with PyDwarf. Super simple, I would think?

python27 path/to/PyDwarf/manager.py -i path/to/df/raw -b path/to/df/rawbak -ver auto -hdir auto -hver auto -p path/to/PyDwarf/scripts path/to/PyLNP/even/more/scripts --jscripts '["some.script", "another.script", {"name": "look.this.script.has.arguments", "args": {"key": "value", "answer": 42}}]'

[Comment created by Pidgeot: 2015-06-22] @pinemach: Well, you'd need to make a pull request :P

As I see it, this is basically how it would have to go. (Feel free to ask questions if you need to.)

Location

Since PyDwarf is spread out over multiple files, you'd likely want to make a submodule pydwarf in the code module and store your code there.

Adaptation

PyDwarf needs to conform to the coding standards for PyLNP. That means:

Python 2 and 3 compatibility (e.g. from io import open)
Full compatibility on Windows, Linux and OS X (as long as you use os.path.join and such, you're likely okay)
Avoid dependencies outside the standard library, if possible
All files need this preamble:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""<brief module description>""" # For otherwise blank __init__.py files, only this line is needed.
from __future__ import print_function, unicode_literals, absolute_import

Run PyLint and minimize the number of warnings (either by changing the code, or - if you have a good reason to not do that - disabling the warning locally)

I have not checked how well your code conforms to this list already, so some of these points may not actually be relevant.

Where possible, existing code should be reused. For PyDwarf, I believe the only real "issue" here is your raw file handling, which I'd prefer to have moved over to use the code in DFRaw (with any extensions to it that might be necessary). However, since you're using raw Python as input, that might not be so easy in practice... in that case, you can either look at replacing DFRaw outright, or writing some kind of adapter that provides the same interface as your current code, but leaves the heavy lifting with DFRaw. You can leave your current code unmodified, but it seems silly to have more than one raw parser, so I'd want that to only be a temporary solution.

Integration

Essentially, PyLNP would need to support two different kinds of mods: PyDwarf scripts and the current "distribute changed file" methods. This is necessary because it's going to look terrible if we have separate interfaces for each type of mod.

To do that, core/mods.py needs to be changed such that it will detect both kinds, and (in merge_a_mod) call out to the correct part of the system. If I'm reading the PyDwarf tutorial correctly, you'd need to provide the configuration directly from the code, rather than trying to read a file.

For knowing which approach to use, I believe the best choice would be to use the manifest system to flag PyDwarf updates. I'd suggest adding a key updateMethod with possible values detectChanges (the default) and pydwarf. (Exact names could differ; these are just ideas). If you believe PyDwarf's merging system is good enough to handle the current format, you could just replace that outright with yours (but for the sake of a possible future integration with BAMM, it might be best to add updateMethod anyway).

[Comment created by pinemach: 2015-06-22] PyDwarf was just updated to 1.0.1, which is mostly a bugfix release with improvements to documentation. https://github.com/pineapplemachine/PyDwarf/releases

Any news on how PyDwarf might be integrated into PyLNP?

[Comment created by PeridexisErrant: 2015-04-13] It may also be worth studying the old DF Mod Merger - without changing to a launcher-specific format like it did, we could probably still use the more advanced merging logic. An elegant way to set up dependency-free or reliably rebuildable metamods continues to elude me, so I suggest we stick to primary mods only for now.

[Comment created by Pidgeot: 2015-06-09] Apologies for the delays in writing here... at the moment, I have very limited time for PyLNP.

First of all, I have no objections against having PyDwarf and/or BAMM integrated, but they'd most likely need some rewriting to adapt to the codebase that's already here, so they can leverage the stuff we currently have (possibly also augmenting the existing code to accomodate their needs).

So, the basic idea I had in mind for patching was this:

Calculate the differences between only the top-level tags (e.g. CREATURE:FOO) to determine added and removed objects.
For each object o in common between vanilla and mod: Calculate the difference between the children of o.

This approach should be very resilient, and if my intuition is correct, it would only really be able to fail when an entire object gets rewritten and tags are re-organized every which way - but those are also the cases that aren't really suitable for automation anyway. For a patch file format, you'd use the top-level tags and immediately preceding tags as context to find the right place to insert/remove tags. (DFRaw might need some changes to make that easy.)

Of course, as I already mentioned, my time is very limited these days, and it's likely to get worse soon - in about 1½ months, I'm moving to another country to start a new job, and it is possible I'll end up with a couple of months of complete downtime after the move because I might not have access to my main PC (which I'd need to do stuff like builds - some coding could possibly still be done from my laptop, but no promises, of course).

@jmorgensen: It was at least the intention that DFRaw could provide this, but I have not looked sufficiently at the code of either PyDwarf or BAMM to know if they need further functionality.

@pinemach: The approach to only consider top-level tags is the best one I've been able to come up with, precisely because it's so difficult to handle anything more complicated - there's no good way to build a complete tree, because the exact semantics are unknown. Language files are a possible exception to that; here it might also be worthwhile to have one more level.

[Comment created by pinemach: 2015-06-08] Given a UI it should be easy enough to interface with PyDwarf since manager.py accepts any command line arguments that should be necessary in getting things working like they ought to. (And I can always add more, or help in entirely different ways, if compatibility becomes an issue.)

On the subject of diffs based on disparate DF versions: I'm not sure that's practical. If it were just a tree then diff'ing with different baselines might be more tenable but in fact it's very difficult to make more than very basic abstractions regarding the way the raws are formatted. And trying to make sense of normal, sequential diffs with different baselines just sounds like trouble to me.

To elaborate on why I believe it's difficult to make abstractions, I can think of no reliable and version-agnostic way to handle all of DF's raws weirdness: Hierarchical tokens (like OBJECT:CREATURE -> CREATURE:ID -> CASTE:ID -> etc.), database query-ish tokens (like SELECT_TISSUE_LAYER, PLUS_TISSUE_LAYER), template-ish tokens (like USE_MATERIAL_TEMPLATE, COPY_TOKENS_FROM).

Maybe it would be possible to do something much simpler, look for diff's as whole added objects and do some special handling for PERMITTED_REACTION and the like? It wouldn't work for mods that make changes to vanilla objects but it might be better than nothing.

[Comment created by PeridexisErrant: 2015-06-07]

If PyDwarf were to be included with PyLNP I'd hope for it to expose the full range of PyDwarf's functionality and not just the one script.

I'd be very happy with this - the PyLNP philosophy is very much to support everything. I just really want to avoid a situation where PyLNP can't usefully handle mods not made for it - and based on the diff-script, that shouldn't be a problem!

Eventually (separate issue here) we should be able to recognize when the starting diff should be to a previous version of DF, get that baseline, and continue.

[Comment created by pinemach: 2015-06-07] Hey, I'm the PyDwarf person here to discuss inclusion within PyLNP.

There is a PyDwarf script which accepts inputs/outputs like typical mergers. (That script is here: https://github.com/pineapplemachine/PyDwarf/blob/master/scripts/pineapple/pydwarf.diff.py) However I can't really vouch for its effectiveness compared to other merge-based managers as I haven't done much stress testing, but it does seem to work well. But PyDwarf's real strength is in modifying raws programmatically, as Button pointed out. If PyDwarf were to be included with PyLNP I'd hope for it to expose the full range of PyDwarf's functionality and not just the one script.

[Comment created by BoomButton: 2015-05-27] @Pidgeot I do open raw files as codepage 437, in all the places I remembered to, anyway. The ASCII conversions file is used by parsing.escape_problematic_literals. If a tileset defines a certain plant graphic as ':', I don't want any future operations to think that that colon is a separator, so I replace ':' with the string 58 before it goes to the logic.

By the way, what I said about PyDwarf in the previous post is already outdated. She does have a mod merging util now, which seems to be a tag-based diff.

[Comment created by jmorgensen: 2015-05-27] Quick idea: would there be an audience for a proper AST library for the raws? E.g. so all these myriad projects could use the same base?

[Comment created by Pidgeot: 2015-05-27] Button's approach seems to basically be what I imagined would be done with DFRaw to improve the mod system, but I've not had time to work on that part myself. DFRaw already models the files as a tree (although it tends to be a pretty flat tree; top-level tags are basically the only ones handled now).

@BoomButton: I haven't had time to look much at the code, but I did notice you're loading some kind of ASCII conversions? Why not just open the files as codepage 437 (what DF uses)?

[Comment created by BoomButton: 2015-05-27] BAMM is doing nothing like PyDwarf. PyDwarf is a library for modders to use to programmatically apply their own scripts. BAMM is much more specialized at the moment, only doing graphics, but doesn't require any programming knowledge to use, and should be able to convert raws from any tileset to any other tileset.

The magic is a config file containing information about what tags define objects and sub-objects; what tags are graphics tags and what objects they're allowed to belong to; and which tokens in these tags contain identifiers, graphics, or non-graphic information. Then it reads in the target & graphics raws as trees of objects with graphics info as the leaves, matches the target & graphics trees against each other, merges them together at a tag-by-tag level, and prints out the results into the output directory.

A work thing came up so I haven't been able to expand on it much, but that's what's there so far.

[Comment created by PeridexisErrant: 2015-05-24] PyDwarfManager is a pure-python mod manager, with fairly smart merging.

Button's Auto Mod Merger is doing somethign fariyl similar. I haven't looked into either in any read depth, but they're both worth investigating further.

Posting here so I've got some record when free time is a thing again...

[Comment created by Pidgeot: 2015-04-05] I've been working on a little something which is fairly close to getting pushed (a couple of days, maybe?), and which should be helpful towards this ticket.

I'll just leave this here...

#!python

>>> from core import dfraw
>>> r=dfraw.DFRaw(r'..\df_31_01\raw\objects\item_toy.txt')
>>> def test(node, indent=0):
...     if node.is_tag: print ' '*indent + node.text
...     for c in node.children: test(c,indent+4)
...
>>> test(r)
    [OBJECT:ITEM]
    [ITEM_TOY:ITEM_TOY_PUZZLEBOX]
        [NAME:puzzlebox:puzzleboxes]
    [ITEM_TOY:ITEM_TOY_BOAT]
        [NAME:toy boat:toy boats]
        [HARD_MAT]
    [ITEM_TOY:ITEM_TOY_HAMMER]
        [NAME:toy hammer:toy hammers]
        [HARD_MAT]
    [ITEM_TOY:ITEM_TOY_AXE]
        [NAME:toy axe:toy axes]
        [HARD_MAT]
    [ITEM_TOY:ITEM_TOY_MINIFORGE]
        [NAME:mini-forge:mini-forges]
        [HARD_MAT]
>>> r=dfraw.DFRaw(r'..\df_28_181_40d\raw\graphics\graphics_example.txt')
>>> test(r)
    [OBJECT:GRAPHICS]
    [TILE_PAGE:DWARVES]
        [FILE:example/dwarves.bmp]
        [TILE_DIM:16:16]
        [PAGE_DIM:2:1]
    [CREATURE_GRAPHICS:DWARF]
        [DEFAULT:DWARVES:0:0:ADD_COLOR]
        [MINER:DWARVES:1:0:AS_IS:DEFAULT]

[Comment created by jmorgensen: 2015-05-25] PyDwarf looks basically like another take on DFRaw, with a better interface. The code hurts my eyes though!

Edit: Is it that hard to write an intelligible parser?

[Comment created by jecowa: 2016-08-28] Can Python control a terminal app? The Rubble mod merger tool has a command-line interface. Rubble works with Windows, Linux, and Mac.

With Rubble, the mods have to be converted into a special format called a template, but I think the template format saves them from having to be updated for Dwarf Fortress releases. As long as Rubble gets updated, the old templates should work with the latest Dwarf Fortress. The source code of Rubble is included in the download link, so if the author ever stops updating, then someone else will be able to take over.

Rubble forum page - http://www.bay12forums.com/smf/index.php?topic=154304.0

Pidgeot / python-lnp