Legilibre / DuraLex

DuraLex is a French bill compiler.
GNU Affero General Public License v3.0
35 stars 9 forks source link

Réformes constitutionnelles not handled #7

Open RouxRC opened 6 years ago

RouxRC commented 6 years ago

Hello, I'm trying to run DuraLex on the constitutional reform:

duralex --url http://www.assemblee-nationale.fr/15/projets/pl0911.asp 

But "Constitution" was not handled as a detected law/code yet, so I tried to fix it adding into duralex/alinea_parser.py line 239:

    # de la Constitution                                                                                                                                                                                                                  
    elif i + 4 < len(tokens) and tokens[i + 4].lower() == u'constitution':
        i += 4
        node['lawType'] = 'constitution'
        node['id'] = 'constitution_du_4_octobre_1958' 

Debugging indicates it does the trick with a few articles, unfortunately it then breaks on a maximum recursion depth exception which I do not understand :(

Traceback (most recent call last):
  File "/home/roux/.pyenv/versions/duralex/bin/duralex", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/home/roux/dev/DuraLex/bin/duralex", line 118, in <module>
    sys.exit(main())
  File "/home/roux/dev/DuraLex/bin/duralex", line 107, in main
    handle_data(data, args)
  File "/home/roux/dev/DuraLex/bin/duralex", line 77, in handle_data
    SwapDefinitionAndReferenceVisitor().visit(tree)
  File "/home/roux/dev/DuraLex/bin/../duralex/AbstractVisitor.py", line 93, in visit
    self.visit_node(node)
  File "/home/roux/dev/DuraLex/bin/../duralex/AbstractVisitor.py", line 87, in visit_node
    self.visit_node(child)
  File "/home/roux/dev/DuraLex/bin/../duralex/AbstractVisitor.py", line 87, in visit_node
    self.visit_node(child)
  File "/home/roux/dev/DuraLex/bin/../duralex/AbstractVisitor.py", line 87, in visit_node
    self.visit_node(child)
  [Previous line repeated 990 more times]
  File "/home/roux/dev/DuraLex/bin/../duralex/AbstractVisitor.py", line 83, in visit_node
    self.visitors[node['type']](node, False)
RecursionError: maximum recursion depth exceeded
Seb35 commented 6 years ago

There were 2 additionnal issues on this text after the Constitution case is handled. Specifically about the Constitution, I find it more sensible to add it in the parse_code_reference method since, similarly to a code, the Constitution is a (natively-) consolidated text and has no ID number. I added the following code in parse_code_reference:

    # de la Constitution
    elif i + 4 < len(tokens) and tokens[i + 4].lower() == u'constitution':
        node['id'] = 'constitution du 4 octobre 1958'
        i = alinea_lexer.skip_to_token(tokens, i, tokens[i+4]) + 2
    # la Constitution
    elif i + 2 < len(tokens) and tokens[i + 2].lower() == u'constitution':
        node['id'] = 'constitution du 4 octobre 1958'
        i = alinea_lexer.skip_to_token(tokens, i, tokens[i+2]) + 2

It could also be added a specific case for the Constitution with a new constant in duralex/tree.py, for instance TYPE_CONSTITUTION_REFERENCE = u'constitution-reference'.


The other issues are probably some forms of sentences:

  1. In the article 11 of the constitutional law, the part "Au sixième alinéa de l'article 16, à l'article 54, au deuxième alinéa de l'article 61, et au dernier alinéa de l'article 88-6 de la Constitution" is not handled, it works with a unique article "Au sixième alinéa de l'article 16 de la Constitution"
  2. In the article 13 of the constitutional law, the part "Les articles 68-1 à 68-3 de la Constitution" is not handled, it works with a unique article.

As of now I don’t understand exactly why these unhandled cases imply an indefinite recursion.

Seb35 commented 6 years ago

La récursion infinie est corrigée par 46b6e66. Certaines syntaxes ne sont pas encore reconnues, travail en cours -- à la minute où j’écris, il y a 8/18 articles de la pjl qui passent entièrement.

Seb35 commented 6 years ago

And the commit for the Constitution was this previous one 6ffe532. As of now, this pjl is recognized at 11/18 (we work in the git branch 'peg').

Seb35 commented 6 years ago

Now 13/18

Seb35 commented 6 years ago

I copy here the details of correct and incorrect articles in DuraLex:

✓ article 1 ✓ article 2 ✓ article 3 - fun fact: this article was removed during commission work https://twitter.com/ApffelArnaud/status/1012607263914242048 ✓ article 4 ✓ article 5     article 6 (1° et 2° are correct, issue on 3°)     article 7 (1° is correct, issue on 2°) ✓ article 8 ✓ article 9 ✓ article 10     article 11 (expression "au dernier alinéa de l’article 88-6" unrecognised + no word-reference) ✓ article 12     article 13 (expression "articles 68-1 à 68-3" unrecognised) ✓ article 14 ✓ article 15 ✓ article 16     article 17 (a lot of issues) ✓ article 18

Seb35 commented 6 years ago

I’ve just tested SedLex generated diffs with DuraLex output, article by article (there are some fatal errors in SedLex if you try on the entire DuraLex output). With a small SedLex patch applied (and pushed 46cff7b), there are 8 perfect articles and 5 partially perfect articles, 2 half-good articles. When DuraLex bad outputs are removed, there are 5 articles whose the issues are SedLex-related.

By the way, Archéo Lex seems to have an issue to create one-article-per-file repository, I’ve used an old Archéo Lex version, and to match SedLex conventions I renamed the generated folder to 'constitution'.

Perfect diff in SedLex:

Partially correct diff in SedLex:

Don’t work at all in SedLex

Seb35 commented 6 years ago

Improved some parts of SedLex. I will update this comment when SedLex will be further improved (to keep the original state).

Perfect diff in SedLex (12 perfect + 2 partially perfect):

Partially correct diff in SedLex (2):

Don’t work at all in SedLex (4):

RouxRC commented 6 years ago

Good job, nearly there ! ;)

Seb35 commented 6 years ago

SedLex now gracefully (and partially) fail instead of triggerring an exception, hence the DuraLex tree from this constitutionnal law can be tested with SedLex (without splitting it by articles). Fatal errors are replaced by a property 'error' in the node.