Open rcharron opened 9 years ago
What does this module do? According to the name of the repository, I guess that you expect it to say wether a request is a math request or not. Am I right?
Yes you are, see also the description "A little module to differentiate math from other questions" I suppose the only interest is for the core module, to avoid useless call to other module. Anyway,it is only a heuritic
Maybe we could reuse the module NLP-ML-standalone (or the futur java implementation) with your data set, to avoid to reimplement a classifier ?
That's not the question. This classifier is already implemented and trained.
Raphaël Charrondière
ENS de Lyon
Le 2015-02-18 09:52, Quentin C. a écrit :
Maybe we could reuse the module NLP-ML-standalone (or the futur java implementation) with your data set, to avoid to reimplement a classifier ?
Reply to this email directly or view it on GitHub [1].
[1] https://github.com/ProjetPP/MathRecognizer/issues/1#issuecomment-74831397
Okay but maybe your can improve your feature extraction (convert a string to a vector, this is an important part). You should at least I think try to tokenised the questions and use a look up table, and this is done by my module.
My token are characters, we speak of math, not of words, so entities are characters, and there is no mean have a lookup table
Raphaël Charrondière
ENS de Lyon
Le 2015-02-18 10:07, Quentin C. a écrit :
Yes but your feature extraction (convert a string to a vector) is not serious (and this is the main part) :p. You should at least tokenised the questions and use a look up table.
Reply to this email directly or view it on GitHub [1].
[1] https://github.com/ProjetPP/MathRecognizer/issues/1#issuecomment-74833015
Let's be clear. The dataset is full of mistakes (cos'(x) is math but sin'(x) is not). Moreover, it consider that a sequence is a math question. But we cannot guess a sequence with first values, so it is a question for OEIS => database => not CAS! That does not allow to differentiate what it has to be processed by the CAS and what does not => no optimisation.
And last, but not the least, the dataset seems to be built automatically (https://github.com/ProjetPP/MathRecognizer/blob/wtf/networktrainer.py#L45) with some heuristics. In the best case, the ML learn to mimic these heuristics but, surely, it will not work as well. So... use directly these heuristics will be more efficient, it will not ?
with some heuristics=> yes and no. With some conditions you say it is math or not, otherwise you have to answer manually.
That does not allow to differentiate what it has to be processed by the CAS and what does not => no optimisation=> if it is only correction on dataset, it is not much complicated
The dataset is full of mistakes=> probably, but i'm not perfect and didn't want to take much time on that.
Ce module est-il utile pour le ppp?