INCATools / ontology-development-kit

Bootstrap an OBO Library ontology
http://incatools.github.io/ontology-development-kit/
BSD 3-Clause "New" or "Revised" License
219 stars 54 forks source link

Updating .gitignore when updating a repository #847

Open gouttegd opened 1 year ago

gouttegd commented 1 year ago

Currently, when initialising a new repository the seeding process generates a .gitignore file, but when updating an existing repository the .gitignore file is not updated – instead, migrating it is left as an exercise to the reader.

The rationale for not updating it automatically is that users might have added their own exclusion rules, which we do not want to erase. But the downside is then that any new exclusion rule added to the ODK template will not be added to existing repositories, unless users take the time to manually update their .gitignore file – I strongly suspect nobody has ever bothered to do that.

There would be at least two ways of allowing automatic update of the ODK exclusion rules while preserving any user-specified rules.

Using a distinct file for ODK rules and for user-specified rules

This is basically the same approach as the one used for Makefile rules: have one file that is entirely ODK-managed (src/ontology/Makefile) and therefore can be overwritten when updating, and a separate file that is entirely user-managed (src/ontology/$ONT.Makefile) and therefore is never overwritten.

This can be achieved by setting the Git configuration option core.excludesFile to, say, .gitignore-odk (this can be done when initialising the repository; when updating, we can check whether that configuration option has been set, and set it if needed). We can then put all ODK-managed rules in that file (which we can overwrite when updating). User-managed custom rules would still go into the standard .gitignore file, which we would never touch.

The downside is that users may already use a system-wide (or rather, “user session”-wide) core.excludesFile file (which by default is $XDG_CONFIG_HOME/git/ignore). That file would no longer be used in ODK repositories where core.excludesFile is set by the ODK, which may surprise those users.

Carrying over custom exclusion rules when updating

The update_repo.sh script could do something as follows:

  1. compare the existing .gitignore file with the one generated by the seeding step, and extract all rules that are not present in the ODK-generated file (those rules are assumed to be user-written rules);
  2. catenate the ODK-generated file with the rules extracted in step 1.

The downside is that this would not allow to remove rules in the ODK template (removed rules would be assumed to be user-written rules and therefore would always be carried over).

Have commented section markers in the .gitignore file

A slightly more refined variant would be to use dedicated “section markers“ in the .gitgnore file, such as:

# ODK exclusion rules, do not modify
[all ODK-managed rules...]
# End of ODK exclusion rules
# You may add your own exclusion rules below that line

The update script could then use those markers to find which rules need to be carried over and which rules can be entirely replaced by the new ODK-generated file.

Compared to the previous solution, this fixes the issue of rules removal, but raises the question of what to do when we first update a repository where those markers do not exist yet.

matentzn commented 1 year ago

I would like solution 2, even if it is a bit hacky, because there are other places where the general approach could be used:

gouttegd commented 1 year ago

Solution 2 can be applied nicely to .gitignore because .gitignore is a line-oriented format, so it’s very easy to isolate the user-written rules and append them to another file.

It would be much trickier to apply to XML or YAML files.

matentzn commented 1 year ago

Thats true. For the catalog one you can probably deal with string hacking, but for the mkdocs, i will require something like https://mikefarah.gitbook.io/yq/v/v2.x/merge

gouttegd commented 1 year ago

Actually, for the XML catalog it would be better to use a nextCatalog directive.

That is, we could have a ODK-managed catalog-v001.xml file that contains everything the ODK needs, and a second catalog file specifically intended for user-written catalog entries (say, user-catalog.xml), with the latter being referenced in the former:

<nextCatalog catalog="user-catalog.xml" />

We can then overwrite catalog-v001.xml during an update without worrying about user-written entries.

That’s exactly the same principle as the one we use for the Make rules, with Makefile which includes $ONT.Makefile.

(And that’s also what I would love to use for the .gitignore file, but that’s impossible because the .gitignore syntax does not allow to include another file.)

matentzn commented 1 year ago

That catalog suggestion sounds awesome!! Does robot / protege support this?

gouttegd commented 1 year ago

Protégé does – it seemingly has full support for the XML catalog spec.

Robot does not, unfortunately – it uses its own SAX-based parser, which only supports the uri element and nothing else. :(

matentzn commented 1 year ago

That is very unfortunate :( For this to work, the solution should work for both..

gouttegd commented 1 year ago

Agree. I’ll see if I can add support for nextCatalog in the Robot implementation, but in the meantime, we should avoid using it.