100x slower for long arrays than recalculating every time

valerii15298 commented 1 year ago

I am trying to build excel like tool in frontend with formulas. As may guess the main point is to be able to change cell if its dependencies are changed. From the first look I thought this library is ideal for me, but... After starting using it I saw performance dropped by more than 1000x and app basically became unresponsive...

When I made isolated repro I see it is really slower(like 100x slower than recalculating every time with), and that is my main problem:

import { memoize } from "proxy-memoize";
const MAX_VALUE = 100;
function getRandomFloat() {
  return Math.random() * MAX_VALUE;
}

let nextId = 1;
function getId() {
  return nextId++;
}

type El = {
  col0: number;
  col1: number;
  col2: number;
  col3: number;
  col4: number;
  id: number;
};
const longArr = new Array(1000).fill(0).map((_, i) => ({
  col0: getRandomFloat(),
  col1: getRandomFloat(),
  col2: getRandomFloat(),
  col3: getRandomFloat(),
  col4: getRandomFloat(),
  id: getId(),
}));

function getCol0Value(arr: El[], rowId: number) {
  const currRow = arr.find((el) => el.id === rowId)!;
  return currRow.col1 * currRow.col2;
}

const mapFuncs = new Map<string, (arr: El[]) => number>();
function getCol0ValueMemoized(arr: El[], rowId: number) {
  const id = `${rowId} - col0`;
  const formula = mapFuncs.get(id);
  if (formula) {
    return formula(arr);
  }
  const memoizedFormula = memoize((arr: El[]) => getCol0Value(arr, rowId));
  mapFuncs.set(id, memoizedFormula);
  return memoizedFormula(arr);
}

function getRows(arr: El[]) {
  return arr.map((el) => {
    // change below `getCol0ValueMemoized` to getCol0Value and see performance boost
    return { ...el, col0: getCol0ValueMemoized(arr, el.id) };
  });
}
const initialGetRowsLabel = "initial get rows";
console.time(initialGetRowsLabel);
const _rows1 = getRows(longArr);
console.timeEnd(initialGetRowsLabel); // 500ms vs 4ms

// change dependency and recalculate
const newArr = longArr.map((el, i) =>
  i === 0 ? { ...el, col3: getRandomFloat() } : el,
);

const afterChangeGetRowsLabel = "after change";
console.time(afterChangeGetRowsLabel);
const _rows2 = getRows(newArr);
console.timeEnd(afterChangeGetRowsLabel); // 100ms vs 5ms

I am using zustand library for it. As you main see main idea is that cell may depend on any other cells from the array(or excel page).

Maybe proxy-memoize just is not designed for such use case? Or maybe you can recommend some other library, or solution to not recalculate formulas for cell if its dependencies are not changed? For now I just cannot use this library unfortunately...

valerii15298 commented 1 year ago

I tried my own approach, since in my case formulas only return primitive value, here is what I had come to:

const valueSymbol = Symbol();

// compare deps
function equalDeps(obj, keys) {
  for (const key in keys) {
    if (valueSymbol in keys[key]) {
      if (obj[key] !== keys[key][valueSymbol]) {
        return false;
      }
      continue;
    }
    if (typeof obj[key] !== "object") {
      return false;
    }
    if (!equalDeps(obj[key], keys[key])) {
      return false;
    }
  }
  return true;
}

function createProxy(obj, keys) {
  return new Proxy(obj, {
    get(target, key) {
      keys[key] = {};
      const originalValue = target[key];
      if (typeof originalValue !== "object") {
        keys[key][valueSymbol] = originalValue;
        return originalValue;
      }
      return createProxy(originalValue, keys[key]);
    },
  });
}

function memoize(fn) {
  return function memoizedFunc(obj) {
    if (memoizedFunc.keys && equalDeps(obj, memoizedFunc.keys)) {
      return memoizedFunc.prevValue;
    }

    memoizedFunc.keys = {};

    const proxy = createProxy(obj, memoizedFunc.keys);
    const value = fn(proxy);
    memoizedFunc.prevValue = value;
    return value;
  };
}

It is like 2x faster than using your library but still 50x slower then recalculating every time... So it seems for my case using Proxies is not the way, and I will need to use selector based dependency comparison(kinda like reselect library).

If you have any ideas how this use case with excel formulas can be solved to not recompute every time I will be very thankful

dai-shi commented 1 year ago

I think you are right. I tried changing from el.id to index. While it's much faster but still 100x slower than pure function.

As we discussed in #80, the slowness comes from creating Proxies. Even if you have a big object, if you only touches a few properties, it's fine. But, if you walk the entire object and touch everything, it's costly.

It's very unfortunate. If your simplified version only gets 2x faster, we are probably out of luck.

dai-shi commented 3 months ago

This should be a design constraint. That doesn't mean to stop coming up with new ideas though.

valerii15298 commented 3 months ago

I thought about making it via functions. Basically the same approach as with proxy but instead first traverse over every value and convert it to function in which remember path of accessed value(I would call it manual proxification). But then I undersatood that every time for new value we need to convert it to functions which is not efficient at all.

Even if you do it level by level(first convert only keys of root object to getters and then based on what getter will be called convert next value to getter), but even this does not seem efficient enough becasue there will be a lot of loops for big objects/arrays...

dai-shi / proxy-memoize

100x slower for long arrays than recalculating every time #81