ProjetPP / MathRecognizer

A little module to differentiate math from other questions
MIT License
1 stars 1 forks source link

Utilité #1

Open rcharron opened 9 years ago

rcharron commented 9 years ago

Ce module est-il utile pour le ppp?

Ezibenroc commented 9 years ago

What does this module do? According to the name of the repository, I guess that you expect it to say wether a request is a math request or not. Am I right?

rcharron commented 9 years ago

Yes you are, see also the description "A little module to differentiate math from other questions" I suppose the only interest is for the core module, to avoid useless call to other module. Anyway,it is only a heuritic

robocop commented 9 years ago

Maybe we could reuse the module NLP-ML-standalone (or the futur java implementation) with your data set, to avoid to reimplement a classifier ?

rcharron commented 9 years ago

That's not the question. This classifier is already implemented and trained.


Raphaël Charrondière

ENS de Lyon

Le 2015-02-18 09:52, Quentin C. a écrit :

Maybe we could reuse the module NLP-ML-standalone (or the futur java implementation) with your data set, to avoid to reimplement a classifier ?

Reply to this email directly or view it on GitHub [1].

Links:

[1] https://github.com/ProjetPP/MathRecognizer/issues/1#issuecomment-74831397

robocop commented 9 years ago

Okay but maybe your can improve your feature extraction (convert a string to a vector, this is an important part). You should at least I think try to tokenised the questions and use a look up table, and this is done by my module.

rcharron commented 9 years ago

My token are characters, we speak of math, not of words, so entities are characters, and there is no mean have a lookup table


Raphaël Charrondière

ENS de Lyon

Le 2015-02-18 10:07, Quentin C. a écrit :

Yes but your feature extraction (convert a string to a vector) is not serious (and this is the main part) :p. You should at least tokenised the questions and use a look up table.

Reply to this email directly or view it on GitHub [1].

Links:

[1] https://github.com/ProjetPP/MathRecognizer/issues/1#issuecomment-74833015

marc-chevalier commented 9 years ago

Let's be clear. The dataset is full of mistakes (cos'(x) is math but sin'(x) is not). Moreover, it consider that a sequence is a math question. But we cannot guess a sequence with first values, so it is a question for OEIS => database => not CAS! That does not allow to differentiate what it has to be processed by the CAS and what does not => no optimisation.

And last, but not the least, the dataset seems to be built automatically (https://github.com/ProjetPP/MathRecognizer/blob/wtf/networktrainer.py#L45) with some heuristics. In the best case, the ML learn to mimic these heuristics but, surely, it will not work as well. So... use directly these heuristics will be more efficient, it will not ?

rcharron commented 9 years ago

with some heuristics=> yes and no. With some conditions you say it is math or not, otherwise you have to answer manually.

That does not allow to differentiate what it has to be processed by the CAS and what does not => no optimisation=> if it is only correction on dataset, it is not much complicated

The dataset is full of mistakes=> probably, but i'm not perfect and didn't want to take much time on that.