cwrc / HuViz

LOD visualization tool for humanities datasets
8 stars 1 forks source link

refactor SortedSet to be a proper subclass of Array #259

Open smurp opened 5 years ago

smurp commented 5 years ago

Modify the SortedSet implementation so we use instances of SortedSet and it becomes a subclass of Array.

Current Behaviour

Currently, each SortedSet is a deep clone of the the SortedSet object rather than being an instance of the SortedSet class.

Possible Solution

Apparently ES6 has facilities for subclassing builtin classes such as Array which were not available when it was first written. The challenge was that D3 wanted array instances but older JS prohibited direct subclassing.

smurp commented 5 years ago

I have added how_heavy(n) to class Huviz to discover how much memory pressure is being inflicted by SortedSet. Copies of this strange object appear to weigh in at just under 3KB each. The significance of this cannot be understood unless we know how many of them there are.

smurp commented 5 years ago

After adding NUM_SORTEDSET tracker it was possible to count copies of SortedSet. After exercising most of the features of HuViz on this url http://localhost:9997/#load+https://raw.githubusercontent.com/cwrc/testData/master/sparqlOutputs/jewish_novelist_biographies.ttl+with+http://sparql.cwrc.ca/ontology/cwrc.ttl it was possible to work the count up to 452 when there are 254 nodes in the dataset. Nearly double the number of nodes is around the number of SortedSets one might get. It is also determined by how many edges there are and how many different states one accomplishes in getting the Taxons and Predicates in the pickers.

So the number of SortedSets is 2x the number of nodes or so. We are seeing datasets of around 2000 nodes being the max at the moment, so 4000 x 3KB = 12MB or so of memory being consumed by SortedSets in such a situation. It would require doing a little test with a big graph like that to find out how much total memory the page would be consuming with a very large graph like that. Here is how to check the memory consumption for the whole page:

window.performance.memory.totalJSHeapSize

I have noticed that this value goes up and down with the state of the graph: how many nodes are activated, etc

In the current case the Jewish Novelist dataset with 254 nodes would have been using about 450 x 3KB = 1.35MB in a situation where the total memory consumption for HuViz on that page was 20MB. So about 7 percent.

Conclusion: It looks like it isn't a huge issue right now. It might be worth a closer look as the system gets more optimized and an on-the-order-of-7-percent improvement is tempting.

smurp commented 5 years ago

https://davidtang.io/2017/09/21/subclassing-arrays-in-es2015.html

Outlines how to do this refactoring with modern javascript.... not possible with portable JS when SortedSet was first implemented.