Swirrl / datahost-prototypes

Eclipse Public License 1.0
0 stars 0 forks source link

Figure out scaling limits of current prototype #310

Open RickMoynihan opened 9 months ago

RickMoynihan commented 9 months ago

Create a synthetic dataset of a fixed schema and width W, with a large number of rows R.

Suggest initial sizes of

The dataset should still conform to being a cube, i.e. all permutations of dimvals should be unique, with just one measure for each.

We can then use this as a basis for a number of tests:

  1. Can we load that dataset in a single commit, and get back out without error?
  2. Can we make that dataset an order of magnitude bigger (i.e. 100m) and still load it?
  3. If so, can we take that dataset of 100m rows split it into 10 append commits of 10m rows each and load it?
  4. Assuming the largest size that works (e.g. 10m) can we make it fall over by adding 10m rows, deleting every row, and adding them all back a number of times?
  5. How many commits (just appends) does it take to fall over? e.g. do we fall over at 100k append commits with 10 rows in each commit?

We can look at the above and profile it in tools like jvisualvm, to see if there are bugs that are causing problems, or if they are just limitations of our in memory approach.

This is a pre-cursor task to choosing a database for the table store.

xdrcft8000 commented 9 months ago

report on these cases can be found here: https://github.com/Swirrl/datahost-prototypes/blob/tr/datageneration/datahost-ld-openapi/datagen/tests_and_reports.clj

xdrcft8000 commented 9 months ago

Error 1 this is the stack trace for the error 500 you get when posting too big of a dataset: java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOf(Arrays.java:3585) at ham_fisted.ArrayLists$IntArrayList.ensureCapacity(ArrayLists.java:966) at ham_fisted.ArrayLists$IntArrayList.addLong(ArrayLists.java:973) at ham_fisted.LongMutList$2.invokePrim(LongMutList.java:72) at ham_fisted.IFnDef$OLO.invoke(IFnDef.java:565) at clojure.core.protocols$naive_seq_reduce.invokeStatic(protocols.clj:62) at clojure.core.protocols$interface_or_naive_reduce.invokeStatic(protocols.clj:72) at clojure.core.protocols$fn8249.invokeStatic(protocols.clj:169) at clojure.core.protocols$fn8249.invoke(protocols.clj:124) at clojure.core.protocols$fn8204$G81998213.invoke(protocols.clj:19) at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:31) at clojure.core.protocols$fn8236.invokeStatic(protocols.clj:75) at clojure.core.protocols$fn8236.invoke(protocols.clj:75) at clojure.core.protocols$fn8178$G8173__8191.invoke(protocols.clj:13) at ham_fisted.Reductions.serialRe duction(Reductions.java:84) at ham_fisted.LongMutList.addAllReducible(LongMutList.java:70) at ham_fisted.ArrayLists$IntArrayList.addAllReducible(ArrayLists.java:999) at tech.v3.datatype.array_buffer$array_sub_list.invokeStatic(array_buffer.clj:652) at tech.v3.datatype.array_buffer$array_sub_list.invoke(array_buffer.clj:628) at tech.v3.datatype.copy_make_container$eval45276$fn45277.invoke(copy_make_container.clj:38) at clojure.lang.MultiFn.invoke(MultiFn.java:244) at tech.v3.datatype.copy_make_container$make_container.invokeStatic(copy_make_container.clj:105) at tech.v3.datatype.copy_make_container$make_container.invoke(copy_make_container.clj:96) at tech.v3.datatype.copy_make_container$GT_array.invokeStatic(copy_make_container.clj:176) at tech.v3.datatype.copy_make_container$__GT_array.invoke(copy_make_container.clj:158) at tech.v3.datatype.copy_make_container$GT_int_array.invokeStatic(copy_make_container.clj:212) at tech.v3.datatype.copy_make_container$__GT_int_array.invoke(copy_make_contai