google-code-export / dkpro-tc

Automatically exported from code.google.com/p/dkpro-tc
Other
1 stars 0 forks source link

Writing CASes to a zip archive #135

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
DKPro-Core 1.6.1. will support writing to ZIP archives using e.g. 
BinaryCasWriter. We should make use of this feature:

[PreprocessingTask]

AnalysisEngineDescription writer = 
createEngineDescription(BinaryCasWriter.class,
BinaryCasWriter.PARAM_TARGET_LOCATION, "jar:file:" + root + "/archive.zip", 
BinaryCasWriter.PARAM_TYPE_SYSTEM_LOCATION, root + "/typesystem.bin",
BinaryCasWriter.PARAM_FORMAT, "6");

and likewise for the Meta- and FeatureExtractionTasks.

One problem remains: I am not sure whether this makes sense for the 
BatchTaskCrossValidation, where we (currently) need to split the overall set of 
files into various folds (file sets), that need to be retrieved individually in 
each fold.

Original issue reported on code.google.com by daxenber...@gmail.com on 28 May 2014 at 12:41

GoogleCodeExporter commented 9 years ago
"root" points to the path on the file system. Unless you have a strong reason 
to store the type system outside the ZIP, I suggest you remove the "root" from 
PARAM_TYPE_SYSTEM_LOCATION and just set it to "typesystem.bin" (no slash). 
Relative type system locations are placed inside the ZIP - absolute locations 
are placed directly on the file system.

Original comment by richard.eckart on 28 May 2014 at 12:42

GoogleCodeExporter commented 9 years ago
Thanks for the hint. I don't see a reason to store the typesystem outside the 
ZIP, so the location should be relative.

Original comment by daxenber...@gmail.com on 28 May 2014 at 12:47

GoogleCodeExporter commented 9 years ago

Original comment by daxenber...@gmail.com on 4 Jun 2014 at 4:09

GoogleCodeExporter commented 9 years ago
I wonder, didn't we plan to do this in 0.6.0? 

Original comment by richard.eckart on 25 Jun 2014 at 3:04

GoogleCodeExporter commented 9 years ago
Because of the problem mentioned in the first post: I'm not sure how to 
integrate this with the current Crossvalidation BatchTask.

Original comment by daxenber...@gmail.com on 25 Jun 2014 at 3:09

GoogleCodeExporter commented 9 years ago
Ah, I see. It shouldn't be a big problem but it is probably too much for the 
0.6.0 release. 

The basic principle should remain the same. We'd just need some extra code to 
extract the file names for the folds from the ZIP instead of scanning them from 
the file system.

Original comment by richard.eckart on 25 Jun 2014 at 3:11

GoogleCodeExporter commented 9 years ago

Original comment by daxenber...@gmail.com on 6 Jan 2015 at 11:40