Closed PonteIneptique closed 5 years ago
This will not completely fix #29, but is one good step.
I just removed a secondary loop as well. In the case of the Latin corpora, this would re-iterate over 200k tokens :)
I just spotted another places for improvement...
So, Scorer().get_scores()
can be called in the two places: in train which is not important because it is a the end of training and in Scorer.print_summary
.
Scorer.print_summary
itself is called in trainer.run_check() https://github.com/emanjavacas/pie/blob/9fa672c04659b2c2219cb704027f48ba703c28b7/pie/trainer.py#L267
But in the same function, 10 lines later, we also got get_scores()
called https://github.com/emanjavacas/pie/blob/9fa672c04659b2c2219cb704027f48ba703c28b7/pie/trainer.py#L272
This mean get_score is actually called twice per Trainer.run_check()
per task.
@emanjavacas We could use a multiprocess.Pool here as well, such as
pool = multiprocessing.Pool(int(len(self.tasks) / 2))
# Dict are now ordered, so we should not worry about it
stored_scores = zip(summary.keys(), pool.map(lambda x: x.get_scores(), summary.values()))
But it can be another discussion.
Is this ready to merge?
This is, multiprocessing will come later :)
Alright. Thanks!
This one possible improvement for #29.
Basically, in sklearn.metrics.classifications, both
precision_score
andrecall_score
are usingprecision_recall_fscore_support
. This mean the function was used twice for the same result inhttps://github.com/emanjavacas/pie/blob/9fa672c04659b2c2219cb704027f48ba703c28b7/pie/models/scorer.py#L37-L38
PS: I added a change to the cli for evaluation because it was boring me to death to have to deal with this copy past everytime. Now
pie eval
loads path from--settings
when none others are provided.