Open RouxRC opened 6 years ago
There were 2 additionnal issues on this text after the Constitution case is handled. Specifically about the Constitution, I find it more sensible to add it in the parse_code_reference
method since, similarly to a code, the Constitution is a (natively-) consolidated text and has no ID number. I added the following code in parse_code_reference
:
# de la Constitution
elif i + 4 < len(tokens) and tokens[i + 4].lower() == u'constitution':
node['id'] = 'constitution du 4 octobre 1958'
i = alinea_lexer.skip_to_token(tokens, i, tokens[i+4]) + 2
# la Constitution
elif i + 2 < len(tokens) and tokens[i + 2].lower() == u'constitution':
node['id'] = 'constitution du 4 octobre 1958'
i = alinea_lexer.skip_to_token(tokens, i, tokens[i+2]) + 2
It could also be added a specific case for the Constitution with a new constant in duralex/tree.py
, for instance TYPE_CONSTITUTION_REFERENCE = u'constitution-reference'
.
The other issues are probably some forms of sentences:
As of now I don’t understand exactly why these unhandled cases imply an indefinite recursion.
La récursion infinie est corrigée par 46b6e66. Certaines syntaxes ne sont pas encore reconnues, travail en cours -- à la minute où j’écris, il y a 8/18 articles de la pjl qui passent entièrement.
And the commit for the Constitution was this previous one 6ffe532. As of now, this pjl is recognized at 11/18 (we work in the git branch 'peg').
Now 13/18
I copy here the details of correct and incorrect articles in DuraLex:
✓ article 1 ✓ article 2 ✓ article 3 - fun fact: this article was removed during commission work https://twitter.com/ApffelArnaud/status/1012607263914242048 ✓ article 4 ✓ article 5 article 6 (1° et 2° are correct, issue on 3°) article 7 (1° is correct, issue on 2°) ✓ article 8 ✓ article 9 ✓ article 10 article 11 (expression "au dernier alinéa de l’article 88-6" unrecognised + no word-reference) ✓ article 12 article 13 (expression "articles 68-1 à 68-3" unrecognised) ✓ article 14 ✓ article 15 ✓ article 16 article 17 (a lot of issues) ✓ article 18
I’ve just tested SedLex generated diffs with DuraLex output, article by article (there are some fatal errors in SedLex if you try on the entire DuraLex output). With a small SedLex patch applied (and pushed 46cff7b), there are 8 perfect articles and 5 partially perfect articles, 2 half-good articles. When DuraLex bad outputs are removed, there are 5 articles whose the issues are SedLex-related.
By the way, Archéo Lex seems to have an issue to create one-article-per-file repository, I’ve used an old Archéo Lex version, and to match SedLex conventions I renamed the generated folder to 'constitution'.
Perfect diff in SedLex:
Partially correct diff in SedLex:
Don’t work at all in SedLex
Improved some parts of SedLex. I will update this comment when SedLex will be further improved (to keep the original state).
Perfect diff in SedLex (12 perfect + 2 partially perfect):
Partially correct diff in SedLex (2):
Don’t work at all in SedLex (4):
Good job, nearly there ! ;)
SedLex now gracefully (and partially) fail instead of triggerring an exception, hence the DuraLex tree from this constitutionnal law can be tested with SedLex (without splitting it by articles). Fatal errors are replaced by a property 'error' in the node.
Hello, I'm trying to run DuraLex on the constitutional reform:
But "Constitution" was not handled as a detected law/code yet, so I tried to fix it adding into
duralex/alinea_parser.py
line 239:Debugging indicates it does the trick with a few articles, unfortunately it then breaks on a
maximum recursion depth exception
which I do not understand :(