RangerWolf / language-detection

Automatically exported from code.google.com/p/language-detection
0 stars 0 forks source link

DetectorFactory.loadProfile(String) is a static method #25

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
DetectorFactory.loadProfile is a static method and additionally, if it is 
called more than once, errors occur.

This causes problems for applications which are completely componentised - when 
you start up a language detection component it wants to initialise 
DetectorFactory.  When that same component initialises again in the future, it 
throws an error.

A better alternative would be that the factory be an instance which can be 
configured and then thrown away.  The second component would then be using a 
different instance, which would avoid the problem completely.

Original issue reported on code.google.com by dan...@nuix.com on 12 Sep 2011 at 2:24

GoogleCodeExporter commented 9 years ago
Hmm, I understand your opinion, but an inconsistent initialization is to be 
trouble so that the initialization should to be executed once among the whole 
components in general.

I was afraid whether DetectorFactory was to be static or not when I designed it.
Then I supposed that language profile's lifecycle is longer than an 
application's one, so I decided it as the current interface.

Why and how does your application need the multiple initialization for 
langdetect?

Original comment by nakatani.shuyo on 13 Sep 2011 at 10:16

GoogleCodeExporter commented 9 years ago
The way our application works, you spend some time loading all the data, and 
then after that, you want as much memory as possible back for analysing it.  
During this analysis step, I would like to throw away anything I'm not using.

My idea would be to keep DetectorFactory's static convenience methods around 
and have them work on a shared instance of the factory, but allow people to 
create their own instances if they want, so that they can be garbage collected 
later.

Original comment by dan...@nuix.com on 13 Sep 2011 at 11:14

GoogleCodeExporter commented 9 years ago
I'd like to know why you need other instances for I suppposed it is unnecessary.

Original comment by nakatani.shuyo on 14 Sep 2011 at 2:39

GoogleCodeExporter commented 9 years ago
It's not that we need two instances at the same time, we just want to garbage 
collect the first instance until we need it again.

Original comment by dan...@nuix.com on 14 Sep 2011 at 6:09

GoogleCodeExporter commented 9 years ago
As I mentioned, I suppose language profiles' lifecyle is longer than 
application process's one. So I can't realize GC for language profiles.
Why do you need it?

Original comment by nakatani.shuyo on 15 Sep 2011 at 4:01

GoogleCodeExporter commented 9 years ago
Because we don't want things in memory which aren't necessary. Most of the time 
we are not doing language detection - are you saying that you think we should 
have it in memory the whole time?  That just seems wasteful to me.

It also encourages bad programming practices on our end.

Suppose two components both use the language identifier.  Which one is 
responsible for initialisation?  Now we have to introduce a third component to 
defend against the problem, which the other two will depend on.  Otherwise both 
components try to initialise the detector and it explodes.

Original comment by dan...@nuix.com on 15 Sep 2011 at 4:54

GoogleCodeExporter commented 9 years ago
I see. 
Then there is DetectorFactory#clear() method to clear loaded profiles for Unit 
test.
I tried to make it public method at trunk of the repository.

http://code.google.com/p/language-detection/source/browse/#svn%2Ftrunk%2Flib

I guess it is useful for your sake. Don't you?

Original comment by nakatani.shuyo on 20 Sep 2011 at 11:02

GoogleCodeExporter commented 9 years ago
That does help a bit.  When our component shuts down, it would now be able to 
clear() the factory so it should free up pretty much everything it had.

Original comment by dan...@nuix.com on 20 Sep 2011 at 11:16

GoogleCodeExporter commented 9 years ago
I beliele that this can be really useful in multi-threaded environments where 
multiple instances of the detector are running. I had to write some fancy 
synchronized blocks to ensure that I do not get any exceptions. It would be 
nice if as users we do not have to worry about these issues.

Original comment by stained...@gmail.com on 7 Jan 2013 at 9:31