CatalaLang / catleg

Development tools for catala programming in the context of French legislative texts.
https://catleg.readthedocs.io/
Apache License 2.0
1 stars 2 forks source link

Add options to ignore links and process escapements in `catleg diff` #55

Closed rprimet closed 1 year ago

rprimet commented 1 year ago

After trying catleg diff on an extract of the IR repo, I noticed a few possible improvements (i.e. diffs that could/should not be reported as such but may be spurious):

 [200 C](/affichCodeArticle.do?cidTexte=LEGITEXT000006069577&idArticle=LEGIARTI000037943290&dateTexte=&categorieLien=cid)

(by the way, this is not a complete/valid URL, @denismerigoux @AltGr is there some specific convention?)

denismerigoux commented 1 year ago

markdown sometimes requires escapements: for instance for square brackets, so escaped brackets ( [2°] ) or stars (*) should not be reported. I am not sure what is the best way of addressing this, converting the reference text to markdown or some other approach?

As I understand correctly in catleg diff you parse the Catala source code text as markdown right? If yes then does your markdown parsing library offer a feature to render markdown as text, and thus de-escaping the brackets, stars, etc. ?

links are sometimes inserted between articles, so links might be handled specially (we could ignore links, or check that they point to the right target...).

I would go with simply ignoring all link targets for catleg diff.

rprimet commented 1 year ago

As I understand correctly in catleg diff you parse the Catala source code text as markdown right? If yes then does your markdown parsing library offer a feature to render markdown as text, and thus de-escaping the brackets, stars, etc. ?

Yep, should do that, makes more sense than going the other way round :)

rprimet commented 1 year ago

Actually, looking at links, maybe those ones [200 C](/affichCodeArticle.do?cidTexte=LEGITEXT000006069577&idArticle=LEGIARTI000037943290&dateTexte=&categorieLien=cid) are spurious and were inserted by the skeleton generator?

denismerigoux commented 1 year ago

Perhaps, in this case they should be removed (just keep the text of the link).

rprimet commented 1 year ago

Yep, I think so, I'll check what the skeleton generator does by default, but keeping the link text and stripping the link seems a good default behavior.

rprimet commented 1 year ago

I think we can close this when #58 is merged