bwiley1 / pandleau

A quick and easy way to convert a Pandas DataFrame to a Tableau .hyper or .tde extract.
MIT License
60 stars 19 forks source link

Support for multi-table extracts? #21

Closed gitmstoute closed 5 years ago

gitmstoute commented 5 years ago

It's possible now to create multiple-table storage extracts using tableausdk in python: docs.

It would be a great feature for Pandleau to support as well - is there any plan to implement this?

bwiley1 commented 5 years ago

Thanks for sharing this! Hmm, it looks like it would be an easy update to make. One question though - assuming you're using python to facilitate analytics in the first place, wouldn't performance be highest if you complete all data merges outside of tableau and then upload the single extract, rather than performing data joins in tableau?

gitmstoute commented 5 years ago

First off Thanks for the reply and for the Pandleau tool - it is great!

To be honest I'm still working through some workbooks and haven't determined yet that I require a multi-table extract, I was wondering if perhaps I had just missed it in the documentation. I'm proceeding with a large single-table for now, and I think performance wise you are correct, single table extract will load faster for the end user.

However, I will be trying shortly to use 'row level security' in some workbooks, and the multi-table extract is supposedly going to help that use case. See the bottom of the article I linked (" Enhancement for some row-level security scenarios") and the blog post it links to. Hopefully In a few more days I'll be able to give you a better answer.

Let me know if I can help as well.

bwiley1 commented 5 years ago

Thanks so much for the feedback! That's great to hear :)

As it is, pandleau constructs an extract, writes to it, and closes the process all in the single to_tableau function. It would be easy to change this approach a little to write multiple tables to one extract, but I'm trying to figure out the best way to do this keeping user experience in mind:

  1. I could split out the to_tableau function into three parts: initializing the extract, adding tables, and closing the extract. The downside would be additional commands to include when writing an extract, but it could give the user the most flexibility.
  2. I could add additional arguments to the to_tableau function: specifying the filename extract, and whether to keep the connection open or closed. I like this option best because it keeps things simpler, but it could still get a little weird with default values (i.e. should the default be to keep an extract open until explicitly closed by the user, or vice versa?).

I'd love to hear your perspective on either approach! If you have a particular application in mind I could try to tailor the approach to accommodate that. Let me know what you think, thanks!

bwiley1 commented 5 years ago

Hey there! @harrison-h made an update to the module that should allow you to now write multiple tables to a single extract - let me know if you run into any issues!

gitmstoute commented 5 years ago

Thanks guys! Sorry I hadn't gotten back to you yet. You are too fast :)

bwiley1 commented 5 years ago

np! It was all @harrison-h on this one :)