Open GoogleCodeExporter opened 8 years ago
I agree. The random int approach in this method can be better allocated. I
never had the time to test a few proposals I have in mind - it would be great
to have this modified for consistency in results and accuracy.
Original comment by mawa...@live.com
on 1 Mar 2011 at 7:46
[deleted comment]
ummm... I conducted initial tests and the results don't look very promising. I
admit I need to debug it more to understand what the best soultion would be
(not far from your sample though). It would be good if someone else who can
spend a few hours looking into this to share their thoughts too. I will try and
look at it in the next couple of weeks along with short language detection
(from previous issue)
Original comment by mawa...@live.com
on 1 Mar 2011 at 6:34
This is new sample code (some improvements) of detectBlock minimizing deviation
of result probability (previous deleted). In some cases it works worse, in some
better.
I also propose some modification to extractNGrams() to add space at the
beginning and at the end of text. It could improve in some cases of short text.
Of course this isn't solution for short text detection.
In my opinion without dictionary this problem can not be solved.
Original comment by markowsk...@gmail.com
on 1 Mar 2011 at 7:58
Attachments:
Thanks for comments and experiment. I would do the experiment, but already done
:D
As you say, your proposal is better and not better both, I reckon too.
> I also propose some modification to extractNGrams() to add space at the
beginning and at the end of text. It could improve in some cases of short text.
I'll consider this proposal.
Original comment by nakatani.shuyo
on 2 Mar 2011 at 2:26
Original issue reported on code.google.com by
markowsk...@gmail.com
on 28 Feb 2011 at 8:24