comtravo / ctparse

Parse natural language time expressions in python
https://www.comtravo.com
MIT License
131 stars 24 forks source link

function regenerate_model missing #69

Closed milk-bottle-liyu closed 5 years ago

milk-bottle-liyu commented 5 years ago

Hi, according to your doc, we should call regenerate_model after updating the rules,. But, I cann't find the function throw out the repository. So what should I do after updating the rules

sebastianmika commented 5 years ago

Hi,

thanks for reporting. You are right, the function is missing - we removed it with the last refactors and forgot to update the docs + add it somewhere else. Will try to fix that asap.

Here is what the function used to looked like: https://github.com/comtravo/ctparse/blob/f4638af8448c7fc59a87d71c8d33b6bcb0abf8fa/ctparse/ctparse.py#L579

With some minor tweaks it should be possible to run it.

milk-bottle-liyu commented 5 years ago

Thanks, Mika! This commit sovles my problem, but I got 2 more questions, and this is about the latest version, not the commit you offered.

  1. The code in:

https://github.com/comtravo/ctparse/blob/ed0226ea68280a7317961fac9d88c4fdbae56863/ctparse/model.py#L58

this line output Xs full of '\\' I think the line should be changed into Xs.append([str(p) for p in parse.production[:i]]) Maybe you should look into the problem

  1. the regenerated model's output is different with the raw model the productions with highest score of the two model are the same, but the score of each one is not same. More detailedly, 18.682 the raw one, 18.103 the regenerated one. But I'm not so familiar with the sklearn's naive bayes model. I am not sure weather it is a problem.
gabrielelanaro commented 5 years ago

@public1024 there is a bug in run_corpus (as you noted it's about id, which is a function). It will be soon fixed as part of this PR https://github.com/comtravo/ctparse/pull/71

The PR will also include the training code as well

Regarding the question number 2:

we are aware of such discrepancies, but consistent scores between retrains is something we are actively working on