blackears / svgSalamander

158 stars 57 forks source link

memory footprint #36

Closed GerdP closed 4 years ago

GerdP commented 5 years ago

It think it would be good to use String.intern() at those places where strings are read from *.svg files. e.g. in XMLParseUtil.parseStyle(). Sorry, I am not familar with git, so I don't know how to create a pull request. In the JOSM project this saves quite a lot of memory because it keeps several instances of SVGDiagram. I've attached a small patch there: https://josm.openstreetmap.de/attachment/ticket/17040/kitfox.patch

matthiasblaesing commented 5 years ago

Before you go down the path of String#intern, you should really read:

https://shipilev.net/jvm-anatomy-park/10-string-intern/ https://shipilev.net/talks/joker-Oct2014-string-catechism.pdf (slides 48-58)

Long story short: Most probably using String#intern is not such a good idea.

I'm in no way an expert of the JVM internals, but you should be careful with memory saving. It could be, that you are just seeing a move of memory from the heap to native memory and if so, you did not gain anything, because you might not see the memory (in the GC statistics), but it is still used.

GerdP commented 5 years ago

I think the benchmark in the first link is nonsense. Nobody would use String.intern() when there is not a single duplicate String in your data. In case of svg we have lots of very short strings like "0" , "1" ,"10", "refX", "width" etc. Those re-appear rapidly and any deduplication method will help to avoid them. In the JOSM project String.intern() is used for other data as well, esp. the key / value pairs from tags. I don't know enough about svg, but maybe two very simple lookup HashSets filled with the common keys and values would do a better job than String.intern() here.

simon04 commented 4 years ago

Here is a more recent afticle about String.intern for deduplication: https://dzone.com/articles/duplicate-strings-how-to-get-rid-of-them-and-save

Having #52 applied in JOSM, the following statistics are printed after termination when executing with -XX:+PrintStringTableStatistics

Java version: 1.8.0_242-b08, Oracle Corporation, OpenJDK 64-Bit Server VM
StringTable statistics:
Number of buckets       :     60013 =    480104 bytes, avg   8.000
Number of entries       :     67626 =   1623024 bytes, avg  24.000
Number of literals      :     67626 =   9584536 bytes, avg 141.729
Total footprint         :           =  11687664 bytes
Average bucket size     :     1.127
Variance of bucket size :     1.130
Std. dev. of bucket size:     1.063
Maximum bucket size     :         7