bundestag / gesetze-tools

Scripts to maintain German law git repository
GNU Lesser General Public License v3.0
114 stars 21 forks source link

Normalize Markdown syntax #3

Open nichtich opened 12 years ago

nichtich commented 12 years ago

Mardown allows for alternative syntax variants, for instance how to create headings, lists, whitespace, etc. Unless we agree on one normalized form, there will be many forms of exactely the same document, leading to different forms of diffs and commits that only origin from changes in markdown syntax. Luckily there is an easy method to normalize via pandoc:

pandoc -f markdown -t markdown index.md

With normalization one can also create a hash of the actual text instead of a hash of one particular form of the text.

See also issue #4 to remove metadata from the laws (right now the metadata is interpreted as markdown table).

stefanw commented 12 years ago

Good idea. Ideally the Markdown will be generated from the XML in canonical format. But the XML contains style changes like inserted line breaks etc. that could lead to unnecessary changes. Something like normalizing too many line breaks is definitely nice.

darkdragon-001 commented 3 years ago

Maybe we should agree on some Markdown source format. Are the original line breaks important for the laws? If not, what about putting every sentence in a separate line in the markdown file. This way, diffs are reproducible and can still be rendered according to the viewer size.