Memory spikes after impersonation

MariusWirtz commented 1 year ago

I am seeing memory spikes. I was running the TM1py script that loops through users and impersonates them to get their views. It was running for 90 minutes until I canceled it due to the memory threshold alert I received. When I looked at StatsForServer it was at 62GB for Total Memory Used, but the StatsByCube was only totaling 12GB so I'm wondering what was consuming the other 50GB. Any thoughts? Thank you!

Originally posted by @jnaff-coursera in https://github.com/cubewise-code/tm1py/issues/891#issuecomment-1502646759

MariusWirtz commented 1 year ago

Do you terminate the TM1 session after you are done? If you don't, that could explain why the memory isn't released.

When you initiate the TM1Service, you can do it using the with statement. This way it is guaranteed that the session is closed after the completion of the block.

from TM1py import TM1Service

params = {
    "address": "",
    "port": 8010,
    "user": "admin",
    "password": "apple",
    "ssl": True
}

with TM1Service(**params) as tm1:
    tm1.server.get_server_name()

If you don't use the with statement, you need to call the logout function explicitly.

from TM1py import TM1Service

params = {
    "address": "",
    "port": 8010,
    "user": "admin",
    "password": "apple",
    "ssl": True
}

tm1 = TM1Service(**params)
tm1.server.get_server_name()
tm1.logout()

rclapp commented 1 year ago

It's unclear what memory is spiking, the tm1 service, or TM1py?

Sent from my mobile device

On Apr 11, 2023 3:27 AM, Marius Wirtz @.***> wrote:

Do you terminate the TM1 session after you are done? If you don't, that could explain why the memory isn't released.

When you initiate the TM1Service, you can do it using the with statement. This way it is guaranteed that the session is closed after the completion of the block.

from TM1py import TM1Service

params = { "address": "", "port": 8010, "user": "admin", "password": "apple", "ssl": True }

with TM1Service(**params) as tm1: tm1.server.get_server_name()

If you don't use the with statement, you need to call the logout function explicitly.

from TM1py import TM1Service

params = { "address": "", "port": 8010, "user": "admin", "password": "apple", "ssl": True }

tm1 = TM1Service(**params) tm1.server.get_server_name() tm1.logout()

— Reply to this email directly, view it on GitHubhttps://github.com/cubewise-code/tm1py/issues/896#issuecomment-1502822091, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEK7GZWMIA3NRL5GGFJFN43XAUBVHANCNFSM6AAAAAAWZ32Y2Q. You are receiving this because you are subscribed to this thread.Message ID: @.***>

jnaff-coursera commented 1 year ago

@MariusWirtz I use the WITH statement but I stopped the python script in the PyCharms IDE. @rclapp the tm1 server was spiking.

rclapp commented 1 year ago

So this could be a number of things, but it would be hard to say without seeing your code.

jnaff-coursera commented 1 year ago

This script runs on my laptop, The "publicdimsubcubeview" List object stores public subsets so I don't repeat listing them for each user.

import configparser
from TM1py.Services import TM1Service
import os
config = configparser.ConfigParser()
config.read('C:/Cubewise/TM1py/tm1py-samples-master/config.ini')

devPath = "C:/Python/"
f = open( os.path.join(devPath, "subsets used in view.csv"), "w")

publicdimsubcubeview=[]
source = 'tm1prod'

# Connect to TM1
with TM1Service(**config[source]) as tm1dev:
    admin_users = [user.name for user in tm1dev.security.get_users_from_group("ADMIN")]
    fpaTeamList = tm1dev.security.get_users_from_group("FPA Team")
    corpFinanceList = tm1dev.security.get_users_from_group("Corp Finance")
    users = list(set(fpaTeamList + corpFinanceList))
    for user in users:
        if user.name in admin_users:
            continue
        with TM1Service(**config[source], impersonate=user.name) as impersonatedtm1:
            print("checking " + user.name)
            dims = impersonatedtm1.dimensions.get_all_names()
            for dim in dims:
                subsets = impersonatedtm1.dimensions.subsets.get_all_names(dim)
                for subset in subsets:
                    viewsPrivate, viewsPublic = impersonatedtm1.views.search_subset_in_native_views(dimension_name=dim, subset_name=subset)
                    for viewPrivate in viewsPrivate:
                        print("private view found for " + user.name + dim + ":" + subset + ":" + viewPrivate.cube + ":" + viewPrivate.name)
                        f.write("Private view for user " + user.name + ", Dimension:" + dim + ", Subset: " + subset + ", Cube: " + viewPrivate.cube + ", View: " + viewPrivate.name)
                    for viewPublic in viewsPublic:
                        if dim+subset+viewPublic.cube+viewPublic.name not in publicdimsubcubeview:
                            publicdimsubcubeview.append(dim+subset+viewPublic.cube+viewPublic.name)
                            print("public view found " + dim + ":" + subset + ":" + viewPublic.cube + ":" + viewPublic.name)
                            f.write("Public view found, Dimension:" + dim + ", Subset: " + subset + ", Cube: " + viewPublic.cube + ", View: " + viewPublic.name)

jnaff-coursera commented 1 year ago

I'm also working with IBM support. They are giving me instructions on how to capture information. I will keep you updated.

MariusWirtz commented 1 year ago

The search_subset_in_native_views is a pretty complex REST call. I'm not surprised that TM1 takes some time and memory to compute it. Possibly you stumbled over a kind of memory leak or some inefficiency within the TM1 server.

I think IBM Support is the right address for this problem. If you want to provide more information to the support team, you can initiate the TM1Service with logging=True. This will print out all REST calls that TM1py is doing.

@adscheevel have you seen TM1 respond like this to the search_subset_in_native_views function before?

jnaff-coursera commented 1 year ago

I managed to duplicate this issue. I doubled the memory footprint from 10 GB to 20 GB in a Dev instance. The Total Memory for cubes stayed flat at 10 GB. I will keep you posted on IBM's response.

adscheevel commented 1 year ago

@MariusWirtz no, I never ran into a memory spike while testing this. Looking at the REST call, it's possible that even though the $top=0 filter is being applied to the elements expand when include_elements=False, the server is generating the element list behind the scenes. I don't believe that's happening due to the significant performance difference between include_elements=False and include_elements=True, but it's possible; IBM would need to research/confirm. If the server is indeed expanding elements when we've told it not to, I would expect the memory leak would be the result of MDX subsets. I could push a modification to the search function to a different test branch that has no expand on elements for others to test.

@jnaff-coursera I don't see anything obviously wrong with your code, but I question your objective and method. Are you simply trying to generate a list of all public and private views used by a given public subset? Do you mean to include control dims or could those be skipped? You seem to loop through the public views for every user without there being any user identifier to the output. You're going to end up with an enormous amount of repetitive data on the public view details. I would suggest not using the search_subset_in_native_views function in the manner you currently are and instead work on gathering all view details, extracting the cube, dim, subset name from each and compiling in a dictionary that you then convert into a tidy dataframe to be exported to csv. I'm curious to know if you still have the memory leak when running something like the below and if it gets you the same detail you're looking for or if I have completely misinterpreted your objective.

from TM1py import TM1Service
import pandas as pd

params = {
    "address": "",
    "port": 8010,
    "user": "admin",
    "password": "apple",
    "ssl": True
}

with TM1Service(**params) as tm1:

    ## dictionary to hold all details to be converted to pandas df later
    view_subset_matches = {'view_type':[], 'user':[], 'cube':[], 'view':[], 'dimension':[], 'subset':[]}

    ## evaluate all public views
    for cub in tm1.cubes.get_all_names(skip_control_cubes=True):
        views = tm1.views.get_all(cub, include_elements=False)
        for view in views[1]:
            dims = view.rows + view.columns + view.titles
            for dim in dims:
                if dim.subset.name != '':
                    view_subset_matches['view_type'].append('public')
                    view_subset_matches['user'].append('all')
                    view_subset_matches['cube'].append(cub)
                    view_subset_matches['view'].append(view.name)
                    view_subset_matches['dimension'].append(dim.dimension_name)
                    view_subset_matches['subset'].append(dim.subset.name)

    ## evaluate all private views by user
    admin_users = [user.name for user in tm1.security.get_users_from_group("ADMIN")]
    users = tm1.security.get_all_users()
    for user in users:
        if user.name in admin_users:
            continue
        with TM1Service(**params, impersonate=user.name) as tm2:
            for cub in tm2.cubes.get_all_names(skip_control_cubes=True):
                views = tm2.views.get_all(cub, include_elements=False)
                for view in views[0]:
                    dims = view.rows + view.columns + view.titles
                    for dim in dims:
                        if dim.subset.name != '':
                            view_subset_matches['view_type'].append('private')
                            view_subset_matches['user'].append(user.name)
                            view_subset_matches['cube'].append(cub)
                            view_subset_matches['view'].append(view.name)
                            view_subset_matches['dimension'].append(dim.dimension_name)
                            view_subset_matches['subset'].append(dim.subset.name)

## build dataframe from dictionary
df = pd.DataFrame.from_dict(view_subset_matches)
df.head()

jnaff-coursera commented 1 year ago

@adscheevel, my objective and method was deliberate, this was just a quick utility I was putting together, nothing formal, so I was not worried about it being very efficient. But to explain, I was not able to delete some subsets because they were attached to views. My objective was to find these views. Someone may have had a private view with one of the undeletable subsets. I needed to loop through each user using search_subset_in_native_views which returns both private and public views.

cubewise-code / tm1py

Memory spikes after impersonation #896