Enhancement - skip_rulederived_cells and skip_consolidated_cells

cubewise-code / tm1py

TM1py is a Python package that wraps the TM1 REST API in a simple to use library.

http://tm1py.readthedocs.io/en/latest/

MIT License

190 stars 109 forks source link

Enhancement - skip_rulederived_cells and skip_consolidated_cells #273

Closed cubewise-gng closed 4 years ago

cubewise-gng commented 4 years ago

Describe what did you try to do with TM1py For all the cells.execute function to read data from cubes, it would be good to have options such as:

skip_rulederived_cells
skip_consolidated_cells

This can ensure that we can quickly read in the static cells values only.

Describe what's not working the way you expect Currently when I read in the values it did not distinguish the static and rule derived values. I processed the data and wanted to post it back to the cube but it thew an exception that you cannot update a rule derived value.

Version

TM1py 1.5.0

MariusWirtz commented 4 years ago

For the functions that don't rely on the CellSets('das81nfa21')/Content request we can filter the response cells through the odata filter like this: /api/v1/ExecuteMDX?$expand=Cells($select=Value;$filter=RuleDerived eq false and Consolidated eq false)

We would have to build the URL dynamically and parse the response similar to how we do when the skip argument is used.

For the execute_mdx_csv, execute_mdx_dataframe methods I am not sure if we could offer that feature.

scrumthing commented 4 years ago

Yes. I would think/expect that mdx views do not distinguish between rule derived and static values. The consolidation part with mdx on the other hand is rather easy but maybe a topic for mdxpy. Maybe best practices would be to have for that kind of operations (fancy data science stuff) a clear data model where you have defined input measures, etc.

rclapp commented 4 years ago

While the content endpoint is helpful, it is full of problems, one of which is the lack of Odata expression support.

IMHO, I think tm1py should move away from using it and only parse the cellsets directly. If we require more speed we could adopt compact json and limit the default properties.

Sent from my mobile device

On Jul 9, 2020 12:46 AM, Christoph Hein notifications@github.com wrote:

Yes. I would think/expect that mdx views do not distinguish between rule derived and static values. The consolidation part with mdx on the other hand is rather easy but maybe a topic for mdxpy. Maybe best practices would be to have for that kind of operations (fancy data science stuff) a clear data model where you have defined input measures, etc.

- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/cubewise-code/tm1py/issues/273#issuecomment-655963553, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEK7GZTPSXYFUURJVFRMGJTR2VYO3ANCNFSM4OS2COTA.

rkvinoth commented 4 years ago

@MariusWirtz ,

Why don't we refactor extract_cellset_raw? We can anyway build Dataframe from Cellset using build_pandas_dataframe_from_cellset.

Developers will have the option to choose what they want:

Get everything faster - Cellset('')/Content
Create Cellset and extract only desired data (comparatively slower)

MariusWirtz commented 4 years ago

@scrumthing Unfortunately, I don't think with MDX we can filter over cell properties (e.g. rule derived).

@rclapp Yeah, it's really a double-edged sword. It's fast but not flexible at all. Let's get rid of it. I am sure Hubert would like it 😅

@rkvinoth Yes. For the functions that are currently using it, we could offer the optional possibility to use the old /Content hack instead of the new way. Then developers can choose what they want.

I think the extract_cellset_dataframe, can stay as-is in a first iteration. I think we just have to add the skip_rulederived_cells and skip_consolidated_cells arguments to it and pass them on to the extract_cellset_csv call.

If we can write a better extract_cellset_csv function that queries the same data in the same shape (CSV string) as the current /Content hack, using Odata filtering we should be good. Then we could simply add the two new arguments to the function.

MariusWirtz commented 4 years ago

I started working on the new extract_cellset functions here https://github.com/cubewise-code/tm1py/pull/279

It's looking good. It turns out the new execute_mdx_csv function retrieves 1M cells in ~ 50 sec while the old /Content function takes ~55 sec.

Do you think we should tune this further and make use of compact JSON?

rclapp commented 4 years ago

I wonder what the payoff is? Could you test the response time for the Rest request to return?

Ryan Clapp Sr. Manager AWS FinTech

Sent from my mobile device

On Jul 13, 2020 3:14 PM, Marius Wirtz notifications@github.com wrote:

I started working on the new extract_cellset functions here #279https://github.com/cubewise-code/tm1py/pull/279

It's looking good. It turns out the new execute_mdx_csv function retrieves 1M cells in ~ 50 sec while the old /Content function takes ~55 sec.

Do you think we should tune this further and make use of compact JSON?

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cubewise-code/tm1py/issues/273#issuecomment-657819927, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEK7GZXIMFFX4QQ6BD6U6TTR3OBMPANCNFSM4OS2COTA.

MariusWirtz commented 4 years ago

@rclapp When I run TM1 and python on localhost the speed difference is insignificant. The difference in the response size is massive though: 10MB vs. 27MB. I assume for WAN this has an impact on performance as well. I agree we should use compact-json even if it will be a large refactoring exercise.