Closed GoogleCodeExporter closed 9 years ago
Should we try it with a non-binary cas format? I still suspect the problem to
be related to that.
Original comment by oliver.ferschke
on 7 Dec 2013 at 6:49
I'm currently trying that.
Original comment by daxenber...@gmail.com
on 7 Dec 2013 at 6:59
SerializedCasWriter/Reader runs fine on the same data. No memory issues.
Original comment by daxenber...@gmail.com
on 9 Dec 2013 at 9:18
Ah, that's "good".
Of course it would be much better if there were no bincas problems, but at
least the problem can be worked around rather easily now.
At least I remembered correctly that issue first appeared when we switched to
bincas.
I propose we switch to serialized CASes directly and file an issue with DKPro
Core.
Once the issue is solved, we can switch back.
Original comment by oliver.ferschke
on 9 Dec 2013 at 9:34
Sounds like a reasonable way to go. I'll run a couple of further tests before I
switch back to serialized CASes.
Original comment by daxenber...@gmail.com
on 9 Dec 2013 at 9:41
If you really found a memory leak in the BinaryCasSerDes6, that would be
something to report to the UIMA issue tracker.
Original comment by richard.eckart
on 9 Dec 2013 at 11:30
Btw. it looks like you are using the BinaryCasReader/Writer, not the
SerializedCasReader/Writer. That ok, because the BinaryCasReader/Writer should
be faster and produce smaller data.
Original comment by richard.eckart
on 9 Dec 2013 at 11:31
Yes, we are currently using BinaryCasReader/Writer; and it's causing the memory
issues. SerializedCasReader/Writer runs fine on the same data (without
compression).
Original comment by daxenber...@gmail.com
on 9 Dec 2013 at 11:36
This issue was closed by revision r473.
Original comment by daxenber...@gmail.com
on 13 Dec 2013 at 1:10
Problem was solved after switching to format "0" in BinaryCasWriter. Indeed
looks like a memory leak in BinaryCasSerDes6.
Original comment by daxenber...@gmail.com
on 13 Dec 2013 at 1:13
Re-opening that task.
Starting with Core 1.7.0 and the new UIMA version, format "0" will not work.
Here is Richard's analysis:
Ok, here an (incomplete) explanation:
The data is written in the PreprocessTask using BinaryCasWriter in format 0
(does not include type system information). This requires that when reading the
data again the CAS must have been initialized with exactly the same type-system
as at the time of writing.
The data is read again in the MetaInfoTask / ExtractFeaturesTask. These tasks
use different components in their pipelines than the PreprocessTask. Yet, for
some reason, with DKPro Core 1.6.0, the type system induced by these components
is the same as in the PreprocessTask, but not when using DKPro Core
1.7.0-SNAPSHOT. Thus, with 1.7.0-SNAPSHOT, data written by the PreprocessTask
cannot be read by the other tasks and causes this exception.
A workaround is to use format 6+ (includes type system) instead of format 0 in
the PreprocessTask. I tried it and it worked. I remember that memory issues had
been reported with format 6+, but it may be worth trying to track these down
instead of sticking to the fragile setup that uses format 0.
Another workaround could be to write the data using the SerializedCasWriter in
PreprocessTask - it also preserves the type system but produces larger files. I
tried this, but I ended up with 0 values in the folds - probably because
SerializedCasWriter uses some different file naming conventions than
BinaryCasWriter.
Original comment by torsten....@gmail.com
on 15 Apr 2014 at 7:21
Torsten and I did a heap dump analysis. I believe to have located the problem
and opened an issue for it in the Apache Jira:
https://issues.apache.org/jira/browse/UIMA-3747
Original comment by richard.eckart
on 15 Apr 2014 at 8:49
This issue has been fixed in the recent snapshot of UIMA 2.6.0 and will be
incorporated in the next UIMA release.
Original comment by torsten....@gmail.com
on 25 Apr 2014 at 9:29
For those interested: the UIMA 2.6.0 release process has already started. There
is an issue with the first release candidate which hopefully can be resolved
before the release (it breaks at least the uimaFIT CpeBuilder in some cases and
may break much more for us).
Original comment by richard.eckart
on 25 Apr 2014 at 9:32
Original issue reported on code.google.com by
daxenber...@gmail.com
on 7 Dec 2013 at 6:46