Cyberjusticelab / JusticeAI

JusticeAI (ProceZeus) is a web chat bot that aims to facilitate access to judicial proceedings involving Quebec tenant/landlord law
https://cyberjusticelab.github.io/JusticeAI/docs/rendered/
MIT License
21 stars 16 forks source link

BUG: Fix Date Regex #425

Closed ghost closed 6 years ago

ghost commented 6 years ago

Description Fix regex for better regex'ing on machine learning container

Scope of Work

Bugs found

1. https://github.com/Cyberjusticelab/JusticeAI/blob/759b92171d116c7a1b12b53f9f53dec96dc3d323/src/ml_service/feature_extraction/post_processing/regex/regex_lib.py#L10 "de" needs to be included for months starting with a consonant for regex to properly match sentences such as https://github.com/Cyberjusticelab/JusticeAI/blob/1f56afdb8c4786b723777649f1b74c004f1267a5/src/ml_service/feature_extraction/post_processing/regex/regex_lib.py#L423

2. Please also investigate sentences like: https://github.com/Cyberjusticelab/JusticeAI/blob/1f56afdb8c4786b723777649f1b74c004f1267a5/src/ml_service/feature_extraction/post_processing/regex/regex_lib.py#L423 as the lack of inclusion of a verb "n'a pas paye" while it's included in other places raises a red flag to me.

3. In here, "de" should not be included as it should be taken care of by the DATE_REGEX. https://github.com/Cyberjusticelab/JusticeAI/blob/1f56afdb8c4786b723777649f1b74c004f1267a5/src/ml_service/feature_extraction/post_processing/regex/regex_lib.py#L431 or else you the regex will attempt to match "de d'aout".

4. In line https://github.com/Cyberjusticelab/JusticeAI/blob/1f56afdb8c4786b723777649f1b74c004f1267a5/src/ml_service/feature_extraction/post_processing/regex/regex_lib.py#L7 and https://github.com/Cyberjusticelab/JusticeAI/blob/1f56afdb8c4786b723777649f1b74c004f1267a5/src/ml_service/feature_extraction/post_processing/regex/regex_lib.py#L8 regex is matching the plural form of the nouns locateurs/locatrices/locataires but throughout the text many omissions of conjugaison troisieme personne du pluriels are missing. Take line: https://github.com/Cyberjusticelab/JusticeAI/blob/1f56afdb8c4786b723777649f1b74c004f1267a5/src/ml_service/feature_extraction/post_processing/regex/regex_lib.py#L427 Ex: "Les locataires n'ont pas paye le loyer d'octobre" will never be matched because statically that line is only "n'a pas" which is troisieme personne du singulier.

This mistake is often repeated throughout the regex file.

  1. https://github.com/Cyberjusticelab/JusticeAI/blob/1f56afdb8c4786b723777649f1b74c004f1267a5/src/ml_service/feature_extraction/post_processing/regex/regex_lib.py#L492 missing the "a la locatrice" feminine version"

  2. https://github.com/Cyberjusticelab/JusticeAI/blob/1f56afdb8c4786b723777649f1b74c004f1267a5/src/ml_service/feature_extraction/post_processing/regex/regex_lib.py#L539 "la TENANT_REGEX" missing

  3. https://github.com/Cyberjusticelab/JusticeAI/blob/1f56afdb8c4786b723777649f1b74c004f1267a5/src/ml_service/feature_extraction/post_processing/regex/regex_lib.py#L544 "aux locateurs" instead of LANDLORD_REGEX with le/les/a la"

  4. https://github.com/Cyberjusticelab/JusticeAI/blob/1f56afdb8c4786b723777649f1b74c004f1267a5/src/ml_service/feature_extraction/post_processing/regex/regex_lib.py#L592 "a preprendre possesion du lieux" singular version of "lieux" (lieu) missing Although I do understand that if "du" is used, they will most likely write "logement", this is a "just in case" regex

TaimoorRana commented 6 years ago

@mihaiqc can you please have another look at the regex_lib file to see if there's any other obvious mistakes.

ghost commented 6 years ago

Understood I'll attempt to spot all possible mistakes.