cubewise-code / optimus-py

Find the best cube dimension order with TM1py
MIT License
17 stars 15 forks source link

Potentially incorrect statistics returned during Permutation Evaluation #48

Closed gtonkin closed 10 months ago

gtonkin commented 11 months ago

OptimusPy reads from the }StatsByCube cube to report on memory usage as it evaluates each permutation.

Where a permutation is tested and completes in less than a minute, there is a risk that the statistics are incorrect as "LATEST" is only updated each minute. Any tests completed before "LATEST" is updated will show the same memory usage.

To mitigate this, the run time of each test should be calculated and a if it took less than a minute wait until at least a minute has passed before reading the statistics from "LATEST".

There is a TM1s.cfg parameter called StatisticalSamplingPeriodMS but this does not seem to be effective when setting to something like 5000 with the expectation that "LATEST" may be updated every 5 seconds rather than every 60.

tomasfelcman commented 11 months ago

Hi gtonkin, From where have you got that Optimus reads from }StatsByCube for every permutation? I was playing around with it without any troubles...

I had a look at the code, at it takes original order and its }StatsByCube LATEST value (waiting maximum of 4 * 15 seconds to make sure there is a value) Then for every permutation it uses REST API function Cube.ReorderDimensions, which returns % change to the current (e.g. original order). The permutation evaluation is thus not based on }StatsByCube entry being made...

Here is a snippet of CubeService.py of TM1Py:

def update_storage_dimension_order(self, cube_name: str, dimension_names: Iterable[str]) -> float:
        """ Update the storage dimension order of a cube

        :param cube_name:
        :param dimension_names:
        :return:  Float: -23.076489699337078 (percent change in memory usage)
        """

Also I have not found any mention of StatisticalSamplingPeriodMS - where have you found that? I am personally interested in other hidden parameters... Thank you, T.

gtonkin commented 11 months ago

Hi Tomas,

With a smaller cube where the dimension reorder result is returned in less than I minute, I saw the same usage being logged.

The function __retrieve_ramusage seems to be doing a read from the }StatsByCube.

The percentage change does not seem to be used in calculating what the new memory would be by multiplying out the original usage by the result.

The other thing that looks to be happening is that after each evaluation, the cube has the recently tested order. If my cube started at 1GB and after the first test I have a reduction of 50% then the new order has the cube consuming 500MB. On the next evaluation, if the result says 5% reduction then the cube now consumes 475MB. Each subsequent percentage is then based on the cube structured by the previous evaluation.

Maybe I am missing something?

tomasfelcman commented 11 months ago

Hi George, in results.py there is class ParmutationResult that has attribute ram_usage defined as follows:

# from original dimension order
        if ram_usage:
            self.ram_usage = ram_usage

        # from all other dimension orders
        elif ram_percentage_change is not None:
            self.ram_usage = PermutationResult.current_ram + (
                    PermutationResult.current_ram * ram_percentage_change / 100)

        else:
            raise RuntimeError("Either 'ram_usage' or 'ram_percentage_change' must be provided")

There is stated the use of "ram_percentage_change" for other than original order of the dimensions. Thus Optimus gets current (original) value of the RAM from }StatsbyCube, and then for any other permutation it does not read the value from }StatsbyCube, but rather uses "ram_percentage_change".

This "ram_percentage_change" is evaluated as return from the "cubes.update_storage_dimension_order" function that is called in the executors.py (that is calling PermutationResults for each permutation):

def _evaluate_permutation(self, permutation: List[str], retrieve_ram: bool = False,
                              reset_counter: bool = False) -> PermutationResult:
        ram_percentage_change = self.tm1.cubes.update_storage_dimension_order(self.cube_name, permutation)

Thus I do not see an issue here... But I might be misinterpreting the logic as I am not the author of the code... Hope this helps, Tomas

MariusWirtz commented 10 months ago

Sorry for the late reply.

I can confirm that only the RAM for the original order is retrieved from the stats cube. All other RAM values are then derived based on the percentage changes.

gtonkin commented 10 months ago

Thanks for confirming Marius - no issues then.