Closed GoogleCodeExporter closed 9 years ago
I forgot: I'm using DKPro Keyphrases 1.5.0-SNAPSHOT on a 64 bit Windows 7
operating system.
Original comment by Steinert...@googlemail.com
on 8 Jul 2014 at 1:02
Hi Laura, I also have a 64-bit Windows 7 operating system and I could see that
the tree-tagger process was finished after I run this code example. Could you
try using the stable version on maven central instead of the snapshot version
and see how it goes?
Original comment by pedrobss...@gmail.com
on 9 Jul 2014 at 8:38
Hi. Where can I find a stable release?
On
http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-ukp-snapshots-
local/de/tudarmstadt/ukp/dkpro/keyphrases/ I can only find the SNAPSHOT
versions 1.5.0 and 1.6.0.
I now switched to 1.6.0-SNAPSHOT but the problem remains. Please note that this
problem only seems to occur when processing MANY texts sequentially. I have a
file of 22 texts where the problem does not occur. However, a file with 2400
texts produces that problem.
Original comment by Steinert...@googlemail.com
on 9 Jul 2014 at 9:59
Do all 2400 texts have the same language?
Original comment by richard.eckart
on 9 Jul 2014 at 11:22
Yes, they are all in English.
Original comment by Steinert...@googlemail.com
on 9 Jul 2014 at 11:48
The latest stable version is on maven central:
http://search.maven.org/#search%7Cga%7C1%7Ckeyphrases
How big is that file? Could you share it with us so we can test it?
Original comment by pedrobss...@gmail.com
on 9 Jul 2014 at 11:52
Here's the file I'm using whereby each line is one text.
Original comment by Steinert...@googlemail.com
on 9 Jul 2014 at 12:14
Attachments:
The following sample code produces the problem.
By the way, given the maven central URL for the stable version, how do I add
that as a repository to my Maven POM?
Original comment by Steinert...@googlemail.com
on 9 Jul 2014 at 12:30
Attachments:
For adding the dependency to your project, you don't need to add a repository
to your pom, just the following tags:
<dependency>
<groupId>de.tudarmstadt.ukp.dkpro.keyphrases</groupId>
<artifactId>de.tudarmstadt.ukp.dkpro.keyphrases.wrappers-gpl</artifactId>
<version>1.5.0</version>
</dependency>
Original comment by pedrobss...@gmail.com
on 9 Jul 2014 at 12:45
Okay, I switched to verion 1.5.0, but the problem persists. Although I thought
it worked for my smaller dataset of 22 texts, I now noticed that it's happening
there, too. These 22 texts are also all english texts.
Original comment by Steinert...@googlemail.com
on 10 Jul 2014 at 8:50
Here's a file with 17 texts (all English) that induces the same problem.
Original comment by Steinert...@googlemail.com
on 10 Jul 2014 at 9:01
Attachments:
I tested it and the treetagger process was over after ending the pipeline. The
screenshots attached show the process during the pipeline and after the
pipeline is gone. I also made a small change to the code, because the
implementation you did throws a NullPointerException and does not close the
buffered reader[1]. But perhaps I understood it wrong... are you stating that
the problem is that various treetagger process are created during the execution
of the pipeline? Is that the issue?
[1] http://en.wikipedia.org/wiki/Resource_leak
Original comment by pedrobss...@gmail.com
on 10 Jul 2014 at 12:02
Attachments:
My problem is exactly that multiple treetagger processes exist during the
execution of the program. Although there should always only be one treetagger
process at any time, they seem to queue up. It starts with just one process but
over the time there start to appear more (or the older ones don't terminate).
Attached you will find a screenshot that shows multiple treetagger processes
during the executing of the testprogram.
Original comment by Steinert...@googlemail.com
on 11 Jul 2014 at 8:27
Attachments:
I forgot to mention that they do not even disappear after the process
terminated.
At least it takes some time for them to disappear...
Original comment by Steinert...@googlemail.com
on 11 Jul 2014 at 10:06
The problem is that you are creating one instance of CoocurrenceGraphExtractor
at each iteration in the loop.
Original comment by pedrobss...@gmail.com
on 22 Jul 2014 at 8:35
Attachments:
Hi,
Sorry for my late answer but I was on holidays.
Thanks for pointing out my error, that was a really stupid one. ^_^
However, my problem is not completely solved. I'd really like to do the
keyphrase extraction multithreaded by using a ThreadPoolExecutor. For that one
specifies a minimal and maximal number of threads running in parallel. Then one
simply adds all the threads to the ThreadPoolExecutor and starts the execution.
The great thing is that one does not have to worry about coordinating the
execution of the threads. The bad thing (in this case) is that I don't know
which threads are executed in which order.
Suppose I want to have a maximum of n threads running in parallel. One could
think that I could simply create n CoocurrenceGraphExtractor instances and
assign them to the various threads. However, saying that at the beginning n
threads are running, where thread i uses CoocurrenceGraphExtractor i. But now
the threads can terminate in an arbitrary order. If thread 2 ends first, the
slot of the ThreadPoolExecutor might be filled with a thread using
CoocurrenceGraphExtractor 1 instead of 2. Then I might have multiple threads
using the same CoocurrenceGraphExtractor instance with different texts at the
same time. Surely that would not work.
Do you have any idea in how to compute the keyphrases multithreadedly?
Yours,
Laura
Original comment by Steinert...@googlemail.com
on 8 Aug 2014 at 8:05
Hi,
I think you should create one CooccurrenceGraphExtractor instance for each
Thread.
Regards,
Pedro
Original comment by pedrobss...@gmail.com
on 8 Aug 2014 at 12:01
Another way is to make just one CoocurrenceGraphExtractor instance and make it
synchronized, so that all the threads can use the same instance thread-safe.
Original comment by pedrobss...@gmail.com
on 8 Aug 2014 at 12:10
Hi,
creating one instance per thread is what I originally did which
resulted in the opening of this discussion. :(
What do you mean by synchronizing? Making the keyphrase extraction a
Mutex/Monitor? Hmmm.... my threads esentially do only keyphrase extraction. So
wouldn't that be as fast as a sequential computation with an additional
overhead for the mutex control?
Original comment by Steinert...@googlemail.com
on 11 Aug 2014 at 10:52
Well, implement a pool where each thread checks out an instance of the
CoocurrenceGraphExtractor and to which the thread returns it before it ends. In
that way you should never have two threads that share the same
CoocurrenceGraphExtractor.
Original comment by richard.eckart
on 11 Aug 2014 at 2:19
Original issue reported on code.google.com by
Steinert...@googlemail.com
on 8 Jul 2014 at 1:01