BuaBook / kdb-common

kdb+ Core Libraries and Utilities
Apache License 2.0
51 stars 16 forks source link

file.kdb: Re-write getLength to add support for "new-format" lists #65

Closed jasraj closed 2 years ago

jasraj commented 2 years ago
jasraj commented 2 years ago

Related to [#61], performance comparison of lists each with 10 million elements:

.file.kdb.getType

Method Uncompressed Compressed (lz4hc)
.file.kdb.getType each x 1 9232 2 675376
type each get each x 398 138412688 450 143132240

.file.kdb.getLength

Method Uncompressed Compressed (lz4hc)
.file.kdb.getLength each x 1 9264 2 675376
count each get each x 393 138412688 456 143132240

Code

q) { set[`$":/tmp/kdb-types/",x; 10000000?y] }./: flip (key;value)@\:enlist[" "] _ .rand.charTypes;
q) set[`$":/tmp/kdb-types/str-anymap"; 10000000?.Q.a cross .Q.a];
q) set[`$":/tmp/kdb-types/num-anymap"; 10000000?(1 2 3; 1 2 3f; 1 2 3h)];

q) { set[(`$":/tmp/kdb-types-comp/",x),.compress.defaults`lz4hc; 10000000?y] }./: flip (key;value)@\:enlist[" "] _ .rand.charTypes;
q) set[(`$":/tmp/kdb-types-comp/str-anymap"),.compress.defaults`lz4hc; 10000000?.Q.a cross .Q.a];
q) set[(`$":/tmp/kdb-types-comp/num-anymap"),.compress.defaults`lz4hc; 10000000?(1 2 3; 1 2 3f; 1 2 3h)];

q) srcs:.file.listFolderPaths `$":/tmp/kdb-types";
q) srcs@:where not srcs like "*#";
q) compSrcs:.file.listFolderPaths `$":/tmp/kdb-types-comp";
q) compSrcs@:where not compSrcs like "*#";

q) (.file.kdb.getType each srcs) = type each get each srcs
11111111111111111111b
q) (.file.kdb.getType each compSrcs) = type each get each compSrcs
11111111111111111111b
q) (.file.kdb.getLength each srcs) = count each get each srcs
11111111111111111111b
q) (.file.kdb.getLength each compSrcs) = count each get each compSrcs
11111111111111111111b
jasraj commented 2 years ago

For splayed tables, using .Q.V results in identical performance to the .file.kdb.get* functions:

q) get `:/tmp/hdb/2021.11.08/trade
time                          sym  price    vol    prices
--------------------------------------------------------------
2015.01.04D04:55:29.310114176 22   34.61037 744631 100 101 102
...

q) \ts type each .Q.V `:/tmp/hdb/2021.11.08/trade
1 4195152
q) \ts .file.kdb.getType each ` sv/: `:/tmp/hdb/2021.11.08/trade,/: cols `:/tmp/hdb/2021.11.08/trade
1 4195664

q) \ts count each .Q.V `:/tmp/hdb/2021.11.08/trade
1 4195152
q) \ts .file.kdb.getLength each ` sv/: `:/tmp/hdb/2021.11.08/trade,/: cols `:/tmp/hdb/2021.11.08/trade
1 4195664