Closed droberts195 closed 4 years ago
This is partly a known issue: we need to communicate the training percentage to the memory estimation process, since this very significantly affects the actual memory usage.
After the fix of #1111 the estimate for a training percent of 5% on the Iowa liquor sales data dropped from 306147kb
to 74672kb
, a great improvement.
With a training percent of 80%, the estimate is currently 273319kb
and the actual is 13109339
bytes.
We've discussed this and we're going to work on calibrating the current worst case memory estimates based on a variety of different classification and regression runs.
This was fixed in #1298.
Steps to reproduce:
Sale (Dollars)
and excluding every variable exceptBottle Volume (ml)
andStore Number
from the analysis. (So effectively we're predicting one number from two others on 5% of the 380000 rows, i.e. 19000 rows.)Part of the problem here is https://github.com/elastic/kibana/issues/60496, because the memory estimate didn't get updated when I added the exclude fields. However, a considerable part of the problem is in the C++ estimation code. If I run the estimate in dev console using the final config it's still 25 times bigger than it needs to be:
returns:
And from the second screenshot you can see actual was 12322863 bytes ~= 12034kb.
This is a big problem for Cloud trials where users don't have much memory to play with, and we refuse to run an analysis if its memory estimate won't fit onto the available machine.