Legilibre / SedLex

SedLex is a frontend generator for French bills compiled using DuraLex.
GNU Affero General Public License v3.0
15 stars 1 forks source link

Generate the exact diff of the amendment #3

Closed Seb35 closed 5 years ago

Seb35 commented 5 years ago

In the visitor AddDiffVisitor the final text is computed and then the diff is computed (with the library difflib of Python). This is fine as a first implementation, but it could (and imho should) be done by creating the exact diff of the amendment, generated only by each change of the amendment. E.g. when a word is replaced, create that word-diff online and not in bulk by an external differ at the end.

Obviously in the big picture the two resulting diffs will be similar, but at a small scale there could be differences. E.g. an alinea is replaced by another and by chance some words are the same in the old and new versions: in this case the differ will probably show them as a common text; but it could disturb some users because the whole alinea was replaced.

Imho there should be a mode, possibly the default mode, where the generated diff is the exact diff, directly generated by the amendment without an external differ.

Seb35 commented 5 years ago

That said, there could be also a second mode (advanced) where refinements are done to remove the very common and large parts of the exact diff, because it happens that entire articles are rewritten but only some words are effectively changed. For instance in this amendment (see text in force), although it seems that the entire article is rewritten, only some parts of the sentences are changed (mainly the initial "Sous réserve des exemptions prévues à l'article L. 622-4,". This second mode is a bonus and should be done in another issue.

Seb35 commented 5 years ago

This is now available in SedLex in the keys 'exactDiff', which is a diff-like format with some adaptations described below. This exact diff is really want the amendment changes, not a diff recreated a posteriori by other means. It has the advantage to describe with the ultimate precision what the amendment changes, although as said before an external diff could sometimes give a better understanding of what an amendment changes, mainly in the case an amendment rewrite an entire article but on a second sight only some words are really changed.

The diff-like format is different of the standard unified diff because it is character-based. Hence the indexes and lengths are a number of characters (not bytes, characters). I kept the initial index as "1" for compatibility (even if I would prefer "0") and newlines are still a newline followed by "-" or "+" depending if the sentence is removed or added.

A very important side-effect/advantage is that these exact character-based diffs are much more easier to merge because:

Seb35 commented 5 years ago

This is mostly solved, there are some minor issues with the typography (add/remove spaces at the beginning/end of modified texts) and sometimes new alineas are added at the top of the article instead of elsewhere. These should be opened in separate issues.

Seb35 commented 5 years ago

The two minor issues mentionned in the previous comment are solved in 32eb105 and 75756f0.