google-code-export / dkpro-tc

Automatically exported from code.google.com/p/dkpro-tc
Other
1 stars 0 forks source link

SparseFeatureStore with LuceneNgram* Features take too much RAM #230

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

the SparseFeatureStore is extremely memory hungry if one of the LuceneNgram* 
features is used.
My experiment uses for instance the LuceneCharacterNGramUFE feature.

If I use the DenseFeatureStore with this feature my max used RAM reaches 
~200MB. If I use the SparseFeatureStore it reaches 1.6 GB memory.
If I increase the amount of data in my experiment this easily exceeds the 
amount of available machine memory.

I tested a bit and growth in memory demand seems to be caused by the 
LuceneFeatures. 

Any suggestions what is the cause of this?

Original issue reported on code.google.com by Tobias.H...@gmail.com on 23 Dec 2014 at 4:12

GoogleCodeExporter commented 9 years ago
That sounds really weird. Is this behavior reproducible with small data amounts?

Original comment by daxenber...@gmail.com on 23 Dec 2014 at 6:13

GoogleCodeExporter commented 9 years ago
You can use the BrownPosDemoCRFSuite in the example package.
If you add the LuceneCharacterNGramUFE feature with this parameters:

   @SuppressWarnings("unchecked")
        Dimension<List<Object>> dimPipelineParameters = Dimension
        .create(DIM_PIPELINE_PARAMS,
                Arrays.asList(new Object[] {
                        LuceneCharacterNGramUFE.PARAM_CHAR_NGRAM_MIN_N,
                        2,
                        LuceneCharacterNGramUFE.PARAM_CHAR_NGRAM_MAX_N,
                        4,
                        LuceneCharacterNGramUFE.PARAM_CHAR_NGRAM_USE_TOP_K,
                        1000 }));

and let the experiment run with/without the SparseFeature Store you should see 
during execution that the requested ram in the case of the sparse-feature store 
is considerably larger.

I get 110MB used memory for the Densestore and 180 for the Sparsestore. The 
sparse is about 60% larger. This size difference scales unfortunately and u 
easily run into memory problems here.

Can you reproduce the magnitude of this numbers? 

Original comment by Tobias.H...@gmail.com on 23 Dec 2014 at 6:36

GoogleCodeExporter commented 9 years ago
Using BrownPosDemoCRFSuite with the default test data, I couldn't reproduce 
this behavior. The process used about 160-180MB for both FeatureStores. 
Unless this is somehow related to your own setup, we might need to investigate 
a bit deeper here. 

Original comment by daxenber...@gmail.com on 23 Dec 2014 at 8:45

GoogleCodeExporter commented 9 years ago
I just profiled the two versions.
In both versions, there is a spike of equal size in memory consumption in the 
first meta collection (two folds, so meta collection is run twice).
For the dense feature store, the second meta collection consumes the same 
amount of memory.
For the sparse feature store, I consistently see double to triple the memory 
consumption of the first meta collection.

Not sure why this happens. Needs to be investigated further.

Original comment by torsten....@gmail.com on 24 Dec 2014 at 11:33