Getting Cell Property other than "Value" from dataframe functions in CellService

cubewise-code / tm1py

TM1py is a Python package that wraps the TM1 REST API in a simple to use library.

http://tm1py.readthedocs.io/en/latest/

MIT License

184 stars 107 forks source link

Getting Cell Property other than "Value" from dataframe functions in CellService #1115

Open bennybooo22 opened 2 months ago

bennybooo22 commented 2 months ago

Describe the bug The all the execute dataframe functions only ever return "Value", no matter what cell_properties is passed

To Reproduce Run the following code with a valid MDX: tm1.cells.execute_mdx_dataframe_pivot(mdx, cell_properties=['Updateable', 'Value']) (After fixing the previously posted bug or doing a execute_view_dataframe_pivot instead)

Expected behavior Return of BOTH the "Value" and "Updateable" property.

Version TM1py: 2.0.2 TM1 Server Version: 11.8

Additional context Possible steps to fix:

Add "cell_properties: list = ['Value']" to argument of function "build_pandas_dataframe_from_cellset" in Utils.py
Replace "cellset_clean[element_names] = cell['Value'] if cell else None" with "cellset_clean[element_names] = [cell[cell_property] for cell_property in cell_properties] if cell else None" in function "build_pandas_dataframe_from_cellset" in Utils.py
Replace "df = pd.DataFrame(values, index=index, columns=["Values"])" with "df = pd.DataFrame(values, index=index, columns=cell_properties)" in function "build_pandas_dataframe_from_cellset" in Utils.py
Add "cell_properties = kwargs.get('cell_properties', ['Value'])" to beginning of function "extract_cellset_dataframe_pivot" in CellService.py
Replace "df = build_pandas_dataframe_from_cellset(data, multiindex=False)" with "df = build_pandas_dataframe_from_cellset(data, multiindex=False, cell_properties=cell_properties)" in function "extract_cellset_dataframe_pivot" in CellService.py
Replace "values=["Values"]" with "values=cell_properties" in the return line of the function "extract_cellset_dataframe_pivot" in CellService.py

Note: This will make the dataframe return 'Value' as Value title instead of what was previously 'Values'. I imagine many python scripts have df = df['Values'] hard coded into them, so there is an issue of backwards compatability

rclapp commented 2 months ago

Can you provide an example of what you would want the DF to look like? Would it have more columns, more rows?

bennybooo22 commented 2 months ago

Can you provide an example of what you would want the DF to look like? Would it have more columns, more rows?

Absolutely!

Here are the lines of code I added to print the result:

And here is the result if you make the changes stated above to TM1py:

The result is a dataframe with a multiindex for the columns

MariusWirtz commented 1 month ago

I wonder how we can support cell properties in a convenient way in the execute_mdx_dataframe function.

Perhaps if we introduced two additional args:

cell_properties (list of cell properties)
cell_properties_axis (rows or columns)

Would that help in your case?

Re the execute_mdx_dataframe_pivot function. This one is less useful than its sister function, execute_mdx_dataframe. Is there any particular reason you chose this one over the "normal" function? TBH, I hope we can deprecate the execute_mdx_dataframe_pivot function soon. It doesn't have a Polars equivalent. It's also painful for us to maintain and roll out all continuous improvements from execute_mdx_dataframe to execute_mdx_dataframe_pivot. I personally don't find it very useful either.

bennybooo22 commented 1 month ago

Yeah, I think it's not an easy thing to add in a clean way. The cell_properties and cell_properties_axis seems like a good way to make an easy implementation. I think that would help in my case.

Personally, my default is always to use execute_mdx_dataframe_pivot over execute_mdx_dataframe. When using python for some sort of data analysis, I will typically construct the MDX to be the cubeview that I plan on analyzing. For example, if I am comparing 3 dimensions in a cube, and want to take the average across 2 dimensions. This may just be my lack of familiarity with pandas but it is much easier to do the aforementioned operation when the dataframe has been pivoted than in the default returned dataframe from execute_mdx_dataframe. Back before execute_mdx_dataframe_pivot existed or before I knew it existed, the following line after execute_mdx_dataframe was always ....pivot_table().

I can't really speak on the maintainability of the function, but I find it very useful.

rclapp commented 1 month ago

Have you tried to include properties in the execute_mdx_dataframe method and set the shaped parameter as true? It takes a different approach, but also returns data frame that is pivoted to match the view spec: https://github.com/cubewise-code/tm1py/blob/75598adde5103d1b635faa0639ebc153b930be29/TM1py/Utils/Utils.py#L584

From: bennybooo22 @.> Reply-To: cubewise-code/tm1py @.> Date: Tuesday, May 14, 2024 at 9:25 AM To: cubewise-code/tm1py @.> Cc: "Clapp, Ryan" @.>, Comment @.***> Subject: Re: [cubewise-code/tm1py] Getting Cell Property other than "Value" from dataframe functions in CellService (Issue #1115)

Yeah, I think it's not an easy thing to add in a clean way. The cell_properties and cell_properties_axis seems like a good way to make an easy implementation. I think that would help in my case.

Personally, my default is always to use execute_mdx_dataframe_pivot over execute_mdx_dataframe. When using python for some sort of data analysis, I will typically construct the MDX to be the cubeview that I plan on analyzing. For example, if I am comparing 3 dimensions in a cube, and want to take the average across 2 dimensions. This may just be my lack of familiarity with pandas but it is much easier to do the aforementioned operation when the dataframe has been pivoted than in the default returned dataframe from execute_mdx_dataframe. Back before execute_mdx_dataframe_pivot existed or before I knew it existed, the following line after execute_mdx_dataframe was always ....pivot_table().

I can't really speak on the maintainability of the function, but I find it very useful.

— Reply to this email directly, view it on GitHubhttps://github.com/cubewise-code/tm1py/issues/1115#issuecomment-2110242303, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEK7GZSZEOUMKX3BWTSP3IDZCIGEFAVCNFSM6AAAAABHJK5H4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJQGI2DEMZQGM. You are receiving this because you commented.Message ID: @.***>

bennybooo22 commented 1 month ago

Have you tried to include properties in the execute_mdx_dataframe method and set the shaped parameter as true? It takes a different approach, but also returns data frame that is pivoted to match the view spec: https://github.com/cubewise-code/tm1py/blob/75598adde5103d1b635faa0639ebc153b930be29/TM1py/Utils/Utils.py#L584

I just tried that by running tm1.cells.execute_mdx_dataframe(mdx, shaped=True, cell_properties=['Updateable', 'Value']) and unfortunately got the following error: TypeError: TM1py.Services.CellService.CellService.extract_cellset_raw() got multiple values for keyword argument 'cell_properties'

In CellService.py the function extract_cellset_csv has a self.extract_cellset_raw() which includes both **kwargs and cell_properties=["Value"] in the arguments. So passing cell_properties in the **kwargs is likely causing the error.

Also worth noting that the behavior of a dataframe that has been created using pivot vs. shaped are different. I think pivot makes the row dimension the index and shaped makes the row dimension a column.

To @MariusWirtz 's point about maintainability, if the shaped function is considered maintainable, it might be possible to have the pivot function just run tm1.cells.execute_mdx_dataframe(mdx,shaped=True) followed by df.set_index( **list of row dimensions** ). I don't know if that is something that sounds good on paper but not in practice though. Might be worth looking into? If not I can just run df.set_index( **list of row dimensions** ) after getting a dataframe from TM1, it's not the biggest issue in the world haha.