Assessment of the ML approach

adnaneh commented 3 years ago

@LamDang @Ryosaeba8 @syrinecheriaa @quentinheinrich Hello everyone, I have a few concerns regarding the ML approach, and at the same time I'm starting to feel rather confident that a specialized approach based regular expressions and expert rules can be more efficient for these documents, here are some reasons:

The documents seem to use the same type of expressions to state informations, making it easier and more sensible to set rules.
There seems to be a high risk with regards to a machine learning approach and how good the results will be. It also seems that the CO2 consumption would be significantly high.
Finally, choosing an approach would leave us more time to work on the equally important sustainability side of the challenge.

I would propose that we focus on the regexp/rules methodology instead and quickly move on to the sustainability. I don't know if this has been done yet but we could use Allen NLP to give us an indication of the results of the ML using the selected sentences from the explained excel. Let me know your thoughts on this.

As a side note, I have my defence on friday and so tomorrow I will initially be working on my slides but I expect to join back the hackathon at some point in the evening.

adnaneh commented 3 years ago

Ci dessus le résultat obtenu avec regexp seulement sur les questions binaires. (Pas encore fait les pays mais ce n'est pas sorcier) On gagne en accuracy mais bien sûr on a une plus grande émission de CO2. Autre point, on peut récupérer les phrases qui ont permis la réponse à la question. On peut donner des priorités aux règles pour prendre la meilleure justification, je n'ai pas encore fait ça. En développant les règles et surtout en ajoutant les pays on devrait avoir un gain significatif. J'ai mis ça pour éclairer un peu ce que je disais sur le premier commentaire, je maintiens du coup ma proposition de laisser de côté le ML. A vous de donner votre avis.

LamDang commented 3 years ago

@adnaneh tout à faire d’accord que les approches par mots clés sont un bon baseline. Ce serait intéressant de finaliser la partie countries aussi puis analyser sa performance.

quentinheinrich commented 3 years ago

Tout à fait d'accord aussi. Il faut creuser l'approche qui fonctionne le mieux au maximum. Tant qu'elle te permet aussi de renvoyer avec chaque prédiction la justification comme demandé tu es bon ! Maintenant si une approche ML/DL te permet d'obtenir d'encore meilleurs scores par la suite, c'est aussi très intéressant.

adnaneh / bnp-hackathon

Assessment of the ML approach #7