ajschumacher / mergic

workflow support for reproducible deduplication and merging
16 stars 3 forks source link

make it fast(er) #40

Open ajschumacher opened 9 years ago

ajschumacher commented 9 years ago

It's so slow! For goodness sake!

ajschumacher commented 9 years ago

Currently:

time ../mergic/mergic.py calc RLdata500.csv (no cache)

real    0m36.539s
user    0m35.593s
sys 0m0.379s

time ../mergic/mergic.py calc RLdata500.csv (with cache)

real    0m4.127s
user    0m3.872s
sys 0m0.123s

The cache file is 1.8M (1835153).

ajschumacher commented 9 years ago

Hmm; why is the cache so much bigger for the tennis players? It's 26M (27700796). Hmm.

There are 669 tennis player names, vs. 500 RLdata patients. But the RLdata groupings seem to be more complex; there are more of them, anyway.

Just to show the calc from the tennis cache:

time mergic calc

real    0m55.827s
user    0m52.343s
sys 0m1.941s

It really seems to sit there for a long time at the end, even after output is done. Why is that? Does it continue calculating?