TI vs TM1py which is a better ETL process?

pal-16 commented 3 years ago

Hello @MariusWirtz

Creation | Updation time is written in the cells. Other than time performance what other factors would you suggest to determine a better ETL process?

MariusWirtz commented 3 years ago

Hi @pal-16,

thank you for providing these stats. Very interesting! Can you please do one more test with 250k or 500k? It's not unusual to have such large dimensions in TM1. And loading very large dimensions can be a bottleneck.

As a "Pythonista" and TM1py developer I may be biased, but here is my take on this: I think while TI is easier and to many old-school TM1'ers more familiar, Python is the better choice for ETL due to the following reasons:

Python is a proper programming language that allows you to express logic in efficient ways using modern data structures (lists, tuples, dictionaries, etc.). features (classes, functions, etc.), and not to mention automated tests.
Python's standard library and third-party extensions (pandas, numpy, etc.) go way beyond the scope of what TI can do.
Contrary to Turbo Integrator, a TM1py script does not run within the scope of a TM1 instance! It is therefore not more complex to interact with n TM1 instances than it is to interact with 1 instance from the script. For instance to load a dimension from instance A to instance B is very simple in python and very hard in TI. Sample:
```
from TM1py import TM1Service
```

with TM1Service(address="", port=12354, ssl=True, user="admin", password="apple") as tm1_source: with TM1Service(address="", port=12297, ssl=True, user="admin", password="apple") as tm1_target: dimension = tm1_source.dimensions.get(dimension_name="Financial Year")

    tm1_target.dimensions.update_or_create(dimension)


- TI is limited in terms of the data sources it can connect to. With python, we you connect to almost any source system seamlessly.

Any other opinions on this one?

wimgielis commented 3 years ago

Interesting topic !

Python and tm1py certainly have advantages, including cross-model (while other tools like Jedox - very similar to TM1 - support this natively within the software).

I always thought that TI would be the fastest way compared to tm1py or other REST-based tools.

A few points regarding TI vs. Python: With respect to the example of Marius on updating dimensions between instances: we should know what the dimensions.update_or_create method does. Does it bring over subsets ? Attributes ? Dimension properties ? Hierarchies (PA-speak) within the dimension ? Security settings ? Etc. While there are ready-made methods that make a number of things much easier, it also involves learning Python as well as knowing which methods to use, what they do / do not do. We all know DimensionElementInsert and AttrPutS kind of functions, so starting from what one knows is usually how it is done.

Python also involves installations.

But definitely tm1py is a very welcome asset in the TM1 landscape so to speak.

Best regards / Beste groeten,

Wim Gielis MS Excel MVP 2011-2014 https://www.wimgielis.com http://www.wimgielis.be

Op do 15 jul. 2021 om 09:56 schreef Marius Wirtz @.***>:

Hi @pal-16 https://github.com/pal-16,

thank you for providing these stats. Very interesting! Can you please do one more test with 250k or 500k? It's not unusual to have such large dimensions in TM1. And loading very large dimensions can be a bottleneck.

As a "Pythonista" and TM1py developer I may be biased, but here is my take on this: I think while TI is easier and to many old-school TM1'ers more familiar, Python is the better choice for ETL due to the following reasons:

Python is a proper programming language that allows you to express logic in efficient ways using modern data structures (lists, tuples, dictionaries, etc.). features (classes, functions, etc.), and not to mention automated tests.

Python's standard library and third-party extensions (pandas, numpy, etc.) go way beyond the scope of what TI can do.

Contrary to Turbo Integrator, a TM1py script does not run within the scope of a TM1 instance! It is therefore not more complex to interact with n TM1 instances than it is to interact with 1 instance from the script. For instance to load a dimension from instance A to instance B is very simple in python and very hard in TI. Sample:

from TM1py import TM1Service with TM1Service(address="", port=12354, ssl=True, user="admin", password="apple") as tm1_source: with TM1Service(address="", port=12297, ssl=True, user="admin", password="apple") as tm1_target: dimension = tm1_source.dimensions.get(dimension_name="Financial Year")
    tm1_target.dimensions.update_or_create(dimension)
TI is limited in terms of the data sources it can connect to. With python, we you connect to almost any source system seamlessly.

Any other opinions on this one?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cubewise-code/tm1py/issues/573#issuecomment-880480568, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDHULLDLQ63JCDQTHNLYMTTX2ICXANCNFSM5AM5KINA .

MariusWirtz commented 3 years ago

@wimgielis,

Yeah. When it comes to writing cell-level data, TI is still the fastest option. In TM1py we try to work around it with the use_ti option in the write method but due to the overhead, it doesn't eliminate the difference entirely.

The sample above updates the dimension with all hierarchies with all elements and edges and attributes. Attribute values need to be transferred separately. The same goes for the security stuff and subsets.

Yes. Python involves an installation, though not necessarily on the machine that is running TM1. And yes you have to learn python but the same goes for TI. Mastering TI is not trivial.

But definitely tm1py is a very welcome asset in the TM1 landscape so to speak

Thanks :)

scrumthing commented 3 years ago

Okay, I will add my two cents here... :-)

TI is only faster because it ignores everything as it runs in GOD-MODE... That is faster but from a maintenance and security point of view it is a nightmare... Yes we all know how TI works. But only because we have always done it that way. And most of the time it is not really the right way. Just because TI expects you to work with your data line by line does not make it the best approach. Most definitely there are a lot of use cases where working with complete datasets instead of a single line makes for a better solution. 1,000 lines of code in metadata or data because you have to think of every variant of data that could be in the source is not how it should be done.

Python and tmpy1 (actually the rest api in general but unfortunately no other language offers a package like tm1py) like @MariusWirtz said opens tm1 up to all kinds of modern technology. Be it git or json or CI/CD or DevOps or ML or AI... the list goes on. So the question is not what is better. The only question is what is tm1py missing that you are not using it in every project.

Looking forward to your comments. ;-)

VentureHill commented 3 years ago

I'm no way near as in-the-loop as I used to be with matters relating to TM1, but how can an 'out-of-process' ETL be faster than an 'in-process' ETL?

Is TM1Py usually ran on the same machine as TM1? If-not then these metrics are not applicable at all and are downright misleading. Even if it is running on the same box, the TM1Py library will introduce a socket related lag which wont be in TI due to it being 'in-process'.

Let me know what I'm missing here...

rkvinoth commented 3 years ago

Faster way to load data: It's obviously a TI process because it runs in Admin mode as @scrumthing pointed out.

So should I go with TI? Well, no! It totally depends on the use case. I have experience converting TI processes that ran allocations in 20 minutes to TM1py based scripts which can complete in 1.5 minutes.

So should I go with TM1py? Well, no! It totally depends on the use case. I have tried loading large files into cubes or exporting large dimensions/attributes. In these type of situations TI is the best. But with the help of new features in TM1py like unbound processes, you can now achieve these things in TM1py (unless your requirement is too complex).

Conclusion: You cannot arrive at the right answer with out performing some experiments. Even experienced python people here would definitely try to solve things using different wats and see which is performing better. IMO, TM1py is an excellent addition to our field. Let's embrace it and understand how to use it better. @MariusWirtz has been providing wonderful support for all the queries to a greater extent (no matter how dumb the questions can be).

rclapp commented 3 years ago

The related lag you mention would only be applicable in testing if you were using the same "method" of processing. In many cases the TI based line by line method is slower than then a query followed by a table transformation operation for example.

What we are really missing, and desperately need, is an ETL REST endpoint. One that allows us to use 3P ETL tools.

Sent from my mobile phone

On Jul 15, 2021 9:05 PM, Ben Hill @.***> wrote:

I'm no way near as in-the-loop as I used to be with matters relating to TM1, but how can an 'out-of-process' ETL be faster than an 'in-process' ETL?

Is TM1Py usually ran on the same machine as TM1? If-not then these metrics are not applicable at all and are downright misleading. Even if it is running on the same box, the TM1Py library will introduce a socket related lag which wont be in TI due to it being 'in-process'.

Let me know what I'm missing here...

- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/cubewise-code/tm1py/issues/573#issuecomment-881103510, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEK7GZXRHFQ66ZQMA7MN53LTX6AQVANCNFSM5AM5KINA.

VentureHill commented 3 years ago

The related lag you mention would only be applicable in testing if you were using the same "method" of processing. In many cases the TI based line by line method is slower than then a query followed by a table transformation operation for example. What we are really missing, and desperately need, is an ETL REST endpoint. One that allows us to use 3P ETL tools. …

Hi Ryan,

That makes sense as a way in which the TM1Py would be faster - one could optimize the amount of data to be added into TM1 prior to actioning it against the dimensions / cube. Also, I totally agree that flexibility Py will provide here is perfect for complex scenarios with merging multiple data queries potentially from multiple places.

That said, the initial benchmarks are lacking context, is this a like-for-like comparison - both systems running the same data through using the same methodology or an edge case where each engine is taking a different methodology based on its unique capabilities.

Regardless of methodology, TM1Py needs to talk to the TM1 Rest API over a network interface (even if on the same machine) adding a delay based on the size of the data being sent, the more data input, the more added delay over the TI approach which includes no such lag. This is why I'm strongly in favor of in-process ETL, I don't think 3P ETL Tools are the answer (unless they are embedded), I think a better TI scripting language/engine would be the answer.

@pal-16 Can we see the source code for this benchmark?

cubewise-tryan commented 3 years ago

@pal-16 It would be interesting to see the code of the two benchmarks as sometimes you can be comparing apples and oranges.

Both TI and TM1Py are great tools for interacting with TM1. As as default I would still stick to TI when dealing with "standard" data sources such as ODBC and flat files. TI is a little clunky but it is very good at what it does and is super fast. There are some great tricks in TM1Py to improve performance but TI does run in-process. That means there isn't any overhead in terms of parsing HTTP requests and JSON and it has direct access to the data stored in memory. TI is also "compiled" so you don't require parsing of the code after it has been saved.

TM1Py is great for stuff that you can't do or that is hard in TI. There are more and more web based sources and dealing with JSON (or XML) in TI can be painful. TM1Py can also be great if you have Python expertise, Python is a very nice language and there are lots tutorials and an endless list of libraries. There are lots of great examples of how TM1Py has opened up a whole new world because it enables so many things that aren't possible with TI.

In summary, both are great but have their own sweet spots, it isn't a matter of better but instead what fits the job.

MariusWirtz commented 3 years ago

Very interesting points. Thanks, everyone for sharing your thoughts and expertise!

regarding the stats, while I would love to challenge the code, but the results don't surprise me, to be honest. An IBM employee familiar with the TM1 engine, recently told me that dimension updates through REST should already be faster than through TI.

@MODLR From my experience, a lot of TM1py implementations do run on the same machine as TM1. So if the stats are based on that assumption they are not misleading per se.

In summary, both are great but have their own sweet spots, it isn't a matter of better but instead what fits the job.

Agree. Perhaps the cases can be split into three groups.

I think there is a range of common TM1 problems that TI can address faster and easier than TM1py. Like for instance reading data from one cube to another with a simple transformation. There is no point in bringing the data into python and then back into TM1.
When it comes to dealing with SQL and flat files I think it makes sense to default to Turbo Integrator. However, depending on the complexity of the script and your expertise in Python it can make sense to write a python script instead. While a simple CSV to cube load will perform faster in TI, we know of more than one example with complex calculations involving multiples cubes that perform better in Python! I understand that this is due to TI's record by record processing (imagine 3 AttrS + 2 CellGetN +1 CellPutN statements for every record) while python is slicing chunks out of a few cubes at the beginning, doing its thing in python, and then writing back one chunk to TM1 at the end.
For anything that goes beyond the scope of TI (data integration from the cloud, forecasting, complex calculations, multi-instance logic, etc.) I think it makes sense to default to TM1py for the moment.

MariusWirtz commented 3 years ago

I think a better TI scripting language/engine would be the answer.

@MODLR interesting thought! What do you have in mind?

I kinda think the TM1 REST API is already the answer. Ultimately everyone prefers different languages and technologies (and it changes over time too!) and REST caters to that.

I would rather have IBM focussing on making the REST API as fast and feature-rich and robust as possible than have them inventing a new language or integrating one fixed scripting language into the server.

MariusWirtz commented 3 years ago

What we are really missing, and desperately need, is an ETL REST endpoint. One that allows us to use 3P ETL tools.

@rclapp Have you looked into Apache Airflow? It's perhaps more workflow management than classic ETL but I imagine it could go really well with TM1 and TM1py. @scrambldchannel did some pioneering work on this. https://scrambldchannel.github.io/airflow-tm1.html#airflow-tm1

wimgielis commented 3 years ago

Still, having basic improvements in TI like... functions or collections or regular expressions or a decent Replace function, Left/Right ... or a process template to pick from a list or common snippets or a library that every developer now does on his/her own (a Bedrock light for instance) or IRR / NPV / ... or ... (I could go on a long time)

shouldn't be too hard, is it ? We are 2021 already, not 1995. It cannot / shouldn't be the case that we need to revert to other outside tools to make sure this can be done. For example, I regularly write a process "function" logic in rules and ask for the result with a series of CellPutN/S and CellGetN/S. Hello IBM, it's 2021 !

Best regards / Beste groeten,

Wim Gielis MS Excel MVP 2011-2014 https://www.wimgielis.com http://www.wimgielis.be

Op vr 16 jul. 2021 om 12:26 schreef Marius Wirtz @.***>:

I think a better TI scripting language/engine would be the answer.

@MODLR https://github.com/MODLR interesting thought! What do you have in mind?

I kinda think the TM1 REST API is already the answer. Ultimately everyone prefers different languages and technologies (and it changes over time too!) and REST caters to that.

I would rather have IBM focussing on making the REST API as fast and feature-rich and robust as possible than have them inventing a new language or integrating one fixed scripting language into the server.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cubewise-code/tm1py/issues/573#issuecomment-881345055, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDHULMZKKDRURLEFAMLWVDTYACMLANCNFSM5AM5KINA .

scrumthing commented 3 years ago

I am with @MariusWirtz on this one. Rest-based is the future. TI will slowly but surely be deprecated. Afterwards you can either use Python or any other language. Besides performance there is no real need for ti because more or less all other languages on the planet have more flexibility. And for a pure data dump into the server IBM will maybe provide something.

AlexanderDvoynev commented 3 years ago

So, @lotsaram, when will Bedrock move from TI to tm1py? ;)

wimgielis commented 3 years ago

Today + 21916 days 😜

rclapp commented 3 years ago

I don't think REST can be the future, well at least not now we know it today. It was never intended to retrieve/send terabytes of data.

Sent from my mobile phone

On Jul 16, 2021 7:09 AM, Christoph Hein @.***> wrote:

I am with @MariusWirtzhttps://github.com/MariusWirtz on this one. Rest-based is the future. TI will slowly but surely be deprecated. Afterwards you can either use Python or any other language. Besides performance there is no real need for ti because more or less all other languages on the planet have more flexibility. And for a pure data dump into the server IBM will maybe provide something.

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cubewise-code/tm1py/issues/573#issuecomment-881367904, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEK7GZWTUJDNTJNIOX2EZX3TYAHKRANCNFSM5AM5KINA.

rclapp commented 3 years ago

Yes we are working to replace CCC with it. However, I am more interested an endpoint that can access the underlying trie structure directly, that way we can use things like AWS Glue.

Sent from my mobile phone

On Jul 16, 2021 6:38 AM, Marius Wirtz @.***> wrote:

What we are really missing, and desperately need, is an ETL REST endpoint. One that allows us to use 3P ETL tools.

@rclapphttps://github.com/rclapp Have you looked into Apache Airflow? It's perhaps more workflow management than classic ETL but I imagine it could go really well with TM1 and TM1py. @scrambldchannelhttps://github.com/scrambldchannel did some pioneering work on this. https://scrambldchannel.github.io/airflow-tm1.html#airflow-tm1

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cubewise-code/tm1py/issues/573#issuecomment-881351505, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEK7GZSC3MJ4DZMYELTRWYLTYADZRANCNFSM5AM5KINA.

MariusWirtz commented 3 years ago

I don't think REST can be the future, well at least not now we know it today. It was never intended to retrieve/send terabytes of data.

This is the feedback we need to provide to IBM regarding the REST API! Loading terabytes of data is somewhat of an edge case though 🙃

MariusWirtz commented 3 years ago

Yes we are working to replace CCC with it. However, I am more interested an endpoint that can access the underlying trie structure directly, that way we can use things like AWS Glue.

I would love to learn more about how you use it today. Didn't know about AWS Glue yet. Will check it out!

wimgielis commented 3 years ago

Probably an edge case but I would assume a oneliner to add an element to a dimension like currently DimensionElementInsert( dim, ‘’, name, type ); in TI.

Should it be different, that is already 1 problem but it should be a oneliner as it is now.

The newish Hierarchy* functions are also picked up very slowly (I only use them in TI when I need to) so I would only change if really really needed.

Op vr 16 jul. 2021 om 14:53 schreef Marius Wirtz @.***>

I don't think REST can be the future, well at least not now we know it today. It was never intended to retrieve/send terabytes of data.

This is the feedback we need to provide to IBM regarding the REST API! Loading terabytes of data is somewhat of an edge case though 🙃

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/cubewise-code/tm1py/issues/573#issuecomment-881425109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDHULMPTDTYXG5WEP5TBBDTYATTVANCNFSM5AM5KINA .

--

Best regards / Beste groeten,

Wim Gielis MS Excel MVP 2011-2014 https://www.wimgielis.com http://www.wimgielis.be

zsoltmoravcsik commented 3 years ago

I agree with Ryan, REST API as a technology should serve mainly end-user requests, and not working as ETL solution. REST API is too verbose to do good and efficient ETL. Do not misunderstand we love TM1py and it is great as the glue between data science applications and several other applications but TM1 would need a proper API to work with the core.

On Fri, 16 Jul 2021 at 14:53, Marius Wirtz @.***> wrote:

I don't think REST can be the future, well at least not now we know it today. It was never intended to retrieve/send terabytes of data.

This is the feedback we need to provide to IBM regarding the REST API! Loading terabytes of data is somewhat of an edge case though 🙃

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cubewise-code/tm1py/issues/573#issuecomment-881425109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHNEK3A4CPB37Y3CYEMISN3TYATTXANCNFSM5AM5KINA .

MariusWirtz commented 3 years ago

REST API is too verbose to do good and efficient ETL

Completely disagree. Who said it's not efficient?

Exchanging data through JSON is not per se inefficient. For dimension updates, it is more or less on par with TI (according to the stats above and according to what we hear from IBM).

For data, you must not look at the throughput rate (e.g. update 100k cells per second) but at the runtime of an allocation or something. You will see that in many cases with REST we are already faster than TI at the bottom line. Are there even more efficient ways to exchange data than JSON? Yes, and the TM1 REST API is eventually going to offer them and TM1py is going to implement them.

TM1 would need a proper API to work with the core.

Are you suggesting to rather wait for a "proper API" and not use REST for loads? Doesn't make sense IMO. A bird in the hand is worth two in the bush and IBM has communicated multiple times that REST is the way to go forward in terms of APIs.

I remember doing a project with SQL a while ago. We were dealing with massive data quantities and struggling to load them into SQL fast enough. Ultimately we found out: the fastest way to load into MSSQL was a bulk insert from CSV files.

In TM1 we are currently exactly in the same situation! We can use REST / TM1py for everything but if you are really dealing with terabytes, just create CSV files on the server and use bulk mode / TI for the very last step (CellPutN). I had to do this only once in my life. My experience: 95% of the time REST is fast enough. You may also look into multi-threading TM1py if REST isn't fast enough.

wimgielis commented 3 years ago

The vast majority of TM1 models out there can just suffice with what we now have in TM1. You know, updating dimensions, loading data, transferring data from 1 cube to another. TI is certainly sufficient in terms of possibilities and speed, not in terms of ease of use (debatable) or code structures or whatever I noted earlier in this topic.

How often do we need to go beyond TI ? Very few times. It's with edge cases like a ARIMA models or IRR or working cross-TM1 model or other statistical excursions or joining SQL statements, ... that you would need to deviate from TI. Then we can supplement TI with tm1py. I'm happy to do that. Making other scripting languages and REST the de facto standard will certainly not be my preference. Even not if that new tool is more on par regarding speed.

So for me: default: TI very much appreciated surrounding developments in tools like tm1py when TI won't cut it (not often, in my experience) but that's not the focus

I built a few useful scripts in tm1py like counting users I'm not going to go away from TI, knowing very well that TI lacks essential things that it should have received long time ago.

Best regards / Beste groeten,

Wim Gielis MS Excel MVP 2011-2014 https://www.wimgielis.com http://www.wimgielis.be

Op vr 16 jul. 2021 om 15:27 schreef Marius Wirtz @.***>:

REST API is too verbose to do good and efficient ETL

Completely disagree. Who said it's not efficient?

Exchanging data through JSON is not per se inefficient. For dimension updates, it is more or less on par with TI (according to the stats above and according to what we hear from IBM).

For data, you must not look at the throughput rate (e.g. update 100k cells per second) but at the runtime of an allocation or something. You will see that in many cases with REST we are already faster than TI at the bottom line. Are there even more efficient ways to exchange data than JSON? Yes, and the TM1 REST API is eventually going to offer them and TM1py is going to implement them.

TM1 would need a proper API to work with the core.

Are you suggesting to rather wait for a "proper API" and not use REST for loads? Doesn't make sense IMO. A bird in the hand is worth two in the bush and IBM has communicated multiple times that REST is the way to go forward in terms of APIs.

I remember doing a project with SQL a while ago. We were dealing with massive data quantities and struggling to load them into SQL fast enough. Ultimately we found out: the fastest way to load into MSSQL was a bulk insert from CSV files.

In TM1 we are currently exactly in the same situation! We can use REST / TM1py for everything but if you are really dealing with terabytes, just create CSV files on the server and use bulk mode / TI. I had to do this only once in my life. My experience: 95% of the time REST is fast enough. You may also look into multi-threading TM1py if REST isn't fast enough.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cubewise-code/tm1py/issues/573#issuecomment-881449924, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDHULN6X5DIYPLYJFQ76QLTYAXUBANCNFSM5AM5KINA .

wimgielis commented 3 years ago

Marius,

The ODBCOutput function in TI is rather slow (for large data volumes) if we do it record by record in the Data tab for instance. Bulk insert of a csv of SQL is much faster. So that is then the Epilog tab and does not make use leave TI, does it ?

Op vr 16 jul. 2021 om 15:27 schreef Marius Wirtz @.***>:

REST API is too verbose to do good and efficient ETL

Completely disagree. Who said it's not efficient?

Exchanging data through JSON is not per se inefficient. For dimension updates, it is more or less on par with TI (according to the stats above and according to what we hear from IBM).

For data, you must not look at the throughput rate (e.g. update 100k cells per second) but at the runtime of an allocation or something. You will see that in many cases with REST we are already faster than TI at the bottom line. Are there even more efficient ways to exchange data than JSON? Yes, and the TM1 REST API is eventually going to offer them and TM1py is going to implement them.

TM1 would need a proper API to work with the core.

Are you suggesting to rather wait for a "proper API" and not use REST for loads? Doesn't make sense IMO. A bird in the hand is worth two in the bush and IBM has communicated multiple times that REST is the way to go forward in terms of APIs.

I remember doing a project with SQL a while ago. We were dealing with massive data quantities and struggling to load them into SQL fast enough. Ultimately we found out: the fastest way to load into MSSQL was a bulk insert from CSV files.

In TM1 we are currently exactly in the same situation! We can use REST / TM1py for everything but if you are really dealing with terabytes, just create CSV files on the server and use bulk mode / TI. I had to do this only once in my life. My experience: 95% of the time REST is fast enough. You may also look into multi-threading TM1py if REST isn't fast enough.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cubewise-code/tm1py/issues/573#issuecomment-881449924, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDHULN6X5DIYPLYJFQ76QLTYAXUBANCNFSM5AM5KINA .

MariusWirtz commented 3 years ago

Probably an edge case but I would assume a oneliner to add an element to a dimension like currently DimensionElementInsert( dim, ‘’, name, type ); in TI.

@wimgielis In the ElementService there are functions for that purpose: add_elements, add_edges, add_element_attributes


from TM1py import TM1Service, Element

with TM1Service(address="", port=12354, ssl=True, user="admin", password="apple") as tm1:
    tm1.elements.add_elements(
        dimension_name="d2",
        hierarchy_name="d2",
        elements=[Element("e11", "Numeric"), Element("e12", "Numeric")])

scrumthing commented 3 years ago

Yes we are working to replace CCC with it. However, I am more interested an endpoint that can access the underlying trie structure directly, that way we can use things like AWS Glue.

I would love to learn more about how you use it today. Didn't know about AWS Glue yet. Will check it out!

I would be interested too!

MariusWirtz commented 3 years ago

The ODBCOutput function in TI is rather slow (for large data volumes) if we do it record by record in the Data tab for instance. Bulk insert of a csv of SQL is much faster. So that is then the Epilog tab and does not make use leave TI, does it ?

Thanks. Back in that project, we weren't writing from TM1 to SQL but writing from Java to SQL. Please don't ask why this architecture.... And yes TM1 was coming after SQL :)

scrumthing commented 3 years ago

The vast majority of TM1 models out there can just suffice with what we now have in TM1.

Allow me to disagree here.

Every form of data cleaning is just very painful in TI.
Adding users is painful because I only can add one user at a time and only add the user to one group per line, etc.
Reusing code is nearly impossible (just check the length they have to go in bedrock)
Switching values from one element to the other is painful

I could go on. :-)

BTW: Awesome discussion here. Loving it! We should get Hubert in on that.

wimgielis commented 3 years ago

Christoph,

Adding users... how often do you that ? :-) My sales colleagues would want to see it every day at every customer but reality is different, no ? ;-) Data cleaning: I agree it can be much better, with reusable functions, regex, etc. That relates to reusing code that you brought up as well. Switching values: how does tm1py help here ? Assuming we already have TI and possibly Bedrock (and if you add own libraries I do wonder what tm1py does add to the table here).

Best regards / Beste groeten,

Wim Gielis MS Excel MVP 2011-2014 https://www.wimgielis.com http://www.wimgielis.be

Op vr 16 jul. 2021 om 15:52 schreef Christoph Hein @.***

:

The vast majority of TM1 models out there can just suffice with what we now have in TM1.

Allow me to disagree here.

Every form of data cleaning is just very painful in TI.

Adding users is painful because I only can add one user at a time and only add the user to one group per line, etc.

Reusing code is nearly impossible (just check the length they have to go in bedrock)

Switching values from one element to the other is painful

I could go on. :-)

BTW: Awesome discussion here. Loving it! We should get Hubert in on that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cubewise-code/tm1py/issues/573#issuecomment-881466596, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDHULJHAKXJQXXQOLG2JNDTYA2RDANCNFSM5AM5KINA .

scrumthing commented 3 years ago

We have lots of systems where automatically new users are added. Could definitely be less lines of code in Python. ;-)

If you have a complex logic for switching values where you have to iterate over the whole cube a pandas dataframe could be very helpful to speed things up and makes it more transparent.

rclapp commented 3 years ago

Switching gears slightly; it would be great if TI did modernize, but even if it doesn't, there are about 3 fundamental features/improvements that are missing that could be added today.

1) Data Duplicate Function: Copy all data from element a to element b for example, where it does not require a record wise operation.

2) Cube Calculations Expressed with Rules: Imagine that you could write rule syntax in TI, and have the resulting values written to the cube. CubeRunStaticCalc(cube, rules). No row wise operations. Just a onetime rule execution. Makes drivers * cost pool just as quick as a merge in pandas. Likewise you could instantly convert rules on a cube to true values without having to export and import.

3) More efficient zero out: this is one is painful. Why the server must traverse all cells to make them 0 seems crazy to me.

pal-16 commented 3 years ago

Thank you to everyone for giving their opinion. This was really an insightful discussion. Actually, in the beginning, Marius pointed to try with 250k+ data but I don't have that much data to try with and the code would be difficult to share as it company-specific. However, I have completely followed the documentation of this repository and carried out my analysis for building a hierarchical dimension, adding elements and elements attributes to it with the help of this. TM1py is really an excellent open source project I found out personally where each issue is discussed and solved. Indeed, Thank you.

VentureHill commented 3 years ago

I think a better TI scripting language/engine would be the answer.

@MODLR interesting thought! What do you have in mind?

I kinda think the TM1 REST API is already the answer. Ultimately everyone prefers different languages and technologies (and it changes over time too!) and REST caters to that.

I would rather have IBM focussing on making the REST API as fast and feature-rich and robust as possible than have them inventing a new language or integrating one fixed scripting language into the server.

In the MODLR platform (a TM1-like competitor) we embedded JavaScript, this means it runs 'In-Process' so it will be more efficient than anything which works over the REST API and also it affords us the benefits of a modern language - Arrays, Objects, Functions, Timers, Template Literals, Try-Catch. We also have the stats language R embedded as a secondary option and could add Python if it was requested enough.

JavaScript is also the most commonly known language / most frequently used as it's in practically every website.

We also have some handy utility functions which make life easy for developers -

notification.email
notification.sms
dimension.createOrWipe
hierarchy.createOrWipe
alias.createOrWipe
cube.wipe(cubeName, e1, e2, eN) Process function documentation - https://docs.modlr.co/process-functions

So you can imagine how this would reduce the number of lines of code to maintain.

Honestly, I would love to see TM1 with a powerful embedded language like JavaScript V8 Engine (from Google - used inside Chrome etc and is open source).

As per your comment on REST API updates, REST API based ETL can never be as fast as an in-process language so besides using it for edge cases, I don't see a REST ETL becoming the go-to for standard builds. When I was reviewing other platforms I looked at Jedox and at the time their ETL was out-of-process (not sure about now) and therefore REST API based however it was an order-of-magnitude slower than TI as a result. Since there were no alternatives which could compete with TM1 I started my own.

cubewise-tryan commented 3 years ago

Hi @MODLR,

We should keep the discussion to TM1 rather than talking about other products 😀.

scrumthing commented 3 years ago

Hi @MODLR,

We should keep the discussion to TM1 rather than talking about other products 😀.

Agreed!

cubewise-code / tm1py

TI vs TM1py which is a better ETL process? #573