cubewise-code / tm1py

TM1py is a Python package that wraps the TM1 REST API in a simple to use library.
http://tm1py.readthedocs.io/en/latest/
MIT License
190 stars 109 forks source link

Cellset writeback #20

Closed dusherwo57 closed 7 years ago

dusherwo57 commented 7 years ago

I'm developing a TM1PY routine to transfer data between servers. When testing out writing back to a cellset including consolidated values, it gave a (reasonable) error "CubeCellWriteStatusElementIsConsolidated"} and failed. I can see this giving rise to issues in putting together generic transfer routines where destination cells might be consolidated, ruled, locked or not-writable by security. I suppose it's really a question for Hubert Heijkers and the team, but what do you think ought to happen?

MariusWirtz-cubewise commented 7 years ago

Hi, that's a very interesting use case.

I Agree. It makes sense that the REST API returns an error on that request. Yes! It would be really convenient if the TM1 REST API offered a more "forgiving" way to do batch updates, where single errors don't abort the full operation.

dusherwo57 commented 7 years ago

I wrote a test script which transferred 10k cells in 6 seconds. But after playing around with this some more I have concluded that it's unlikely to be usable except when it you can work out easily which cells are ruled, and also where the volume of data is not humongous. In practice, a TI with the right selections gets the right answer in the simplest way.

MariusWirtz commented 7 years ago

Hi, the write_values function is easy to use but slow.

The fastest way to write to a cube (through the REST API) is to call the write_values_through_cellset function. It's magnitudes faster because it creates a cellset first and then sends only a list of values to TM1. here is a sample: https://github.com/MariusWirtz/TM1py-samples/blob/master/Load%20Data/cube%20to%20cube.py

If you need more performance, you can also call this function asynchronously. https://github.com/MariusWirtz/TM1py-samples/blob/master/Update%20Cells/write%20data%20fast.py

Cheers,

Marius

dusherwo57 commented 7 years ago

That's the one I was using in the server to server copy. I attach my writeup of the issues: I looked into this as a possible way to deal with the not uncommon requirement to move data between servers. The good news it that it works, transferring ten thousand cells between two versions of the SDATA server in 5 seconds. The source and destination views were (pre-)set to be identical and to contain stored values only. I had hoped to generalise and scale this to handle any kind of data transfer. On further testing and study I can't see this as being usable for that purpose:

a The destination view cannot include ruled or consolidated values, otherwise the update fails (it is bulk rather than cell-based, which ought to be a good thing, but doesn't allow cell level testing) b I looked at various ways to eliminate ruled cells (consolidated cells are less of an issue) by creating a new view but I can't see any TI-based functions to de-duplicate subsets - so the best way is to write a flat file.... And this leads straight back to the existing, well proven approach of writing and reading a flat file. No Rest/Odata needed.

Perhaps IBM could be persuaded to allow update of non updateable cells to give a warning rather than a failure, we could revisit this. I'd say there would still be scale issues as the cellset object would need to be created by the Python script and could be very large for production systems.

MariusWirtz commented 7 years ago

5 seconds doesn't sound too bad actually!

When we assume that rules and dimensions are exactly the same in source and target cube we can actually make it work with a bit of python magic.

from TM1py.Services import TM1Service

with TM1Service(address='', port=8001, user='admin', password='apple', ssl=True) as tm1:
    mdx = "SELECT " \
          "NON EMPTY {TM1SubsetAll([plan_chart_of_accounts])} on ROWS, " \
          "NON EMPTY TM1SubsetAll([plan_time]) on COLUMNS  " \
          "FROM [Plan_BudgetPlan] " \
          "WHERE ([plan_version].[FY 2004 Budget],[plan_business_unit].[10110],[plan_department].[410]," \
          "[plan_exchange_rates].[local],[plan_source].[input]) "

    data = tm1.data.execute_mdx(mdx, ['Value', 'RuleDerived', 'Consolidated'])

    cellset = {}
    for element_unique_names, cell in data.items():
        if not cell['Consolidated'] and not cell['RuleDerived'] and cell['Value']: 
            element_names = tuple([unique_name[unique_name.rfind('].[')+3:-1] for unique_name in element_unique_names])
            value = cell['Value']
            cellset[element_names] = value

    tm1.data.write_values('Plan_BudgetPlan', cellset)

It's not a very elegant solution though. And the writing is not based on MDX based cellset-updates. so it doesn't scale well.

I like the idea about warnings instead of errors when addressing non-updateable cells.

dusherwo57 commented 7 years ago

Agreed that the speed is fine. I was trying to use the write_values_through_cellset, mirroring source and destination. Think I will try throwing write_values at a larger system and see how it works in practice. For production TM1 servers, I could envisage the python structures stressing RAM - perhaps I'm too cautious here. You have solved one of my issues - how to build up a 'flat' list of unique tuples equivalent to a CSV export.

dusherwo57 commented 7 years ago

I have tested this approach. It is much more usable as it delivers a CSV-like flat stream. Testing with source including ruled and consolidated elements gives 16 seconds to write 9689 cells, reading 61740 cells from the source. Not unreasonable I'd say.

MariusWirtz commented 7 years ago

yeah sounds pretty reasonable to me. Are you running the script on the server?