bwiley1 / pandleau

A quick and easy way to convert a Pandas DataFrame to a Tableau .hyper or .tde extract.
MIT License
61 stars 19 forks source link

Append instead of overwrite #19

Closed rachelvuu closed 4 years ago

rachelvuu commented 5 years ago

Is there any way to append to the hyper extract rather than overwrite the file?

bwiley1 commented 5 years ago

Hi Rachel,

Thanks for reaching out! Last time I checked on this, I had some trouble trying to manipulate the .hyper or .tde files as it looked like writing functions in the tableausdk package were encrypted. Originally I had wanted to push out another version where you could convert a .tde or .hyper to a pandas dataframe, or otherwise manipulate the data between sources, but I had some trouble trying to do this. I agree though, it would be a cool functionality to add - I'll try to do more research on the issue. Thanks!

Best, Ben

ghost commented 4 years ago

Hey @bwiley1 ,

I think the issue is around lines 139-154 in pandleau.py. This should be able to be abstracted and instead of creating a table definition from scratch, check to see if 'Extract' already exists, and if so, just set table_def to the definition that already exists.

At least in the 'old' SDK, tableauSDKSample.py has an example of this -- the procedure createOrOpenExtract() checks first if the table exists, otherwise it creates it. Then, the procedure populateExtract() gets the table schema using table.getTableDefinition()

However, I don't know how nicely this will play with the "add index" function of pandleau. TDEs (and I'm assuming hypers as well) aren't really meant to be read by anything but Tableau, and the SDK doesn't have any public reading functions that I'm aware of.

I should have some time when I get home to clone and play around and see if it's something that can be adjusted. I am making an assumption that these functions exist in both SDK and SDK2, but I guess I'll find out!

bwiley1 commented 4 years ago

That's very true... I think using createOrOpenExtract would also solve writing multiple tables to a single extract (another issue on this list). That would be cool if you figure it out! Let me know if there's anything I can help out with!

ghost commented 4 years ago

@bwiley1 Check it out here: https://github.com/harrison-h/pandleau/tree/load-existing-table

Was pretty straightforward. I've only tested it on the legacy SDK (as it's what I have for my use case) but it works exactly as intended. The use case I have is that I'm transforming very large datasets in a way to feed them into Tableau, so I end up having to pass it along to the extract in chunks.

Additionally, I think you're right about it allowing you to write multiple tables! As long as that argument is passed, it should work just fine. Also not tested though.

As full disclosure, I'm not in CS or anything, so feel free to point out anything in my code that could be better or improved upon. If it looks all good to you as well I can open a pull request.

bwiley1 commented 4 years ago

I think this looks fine! If you want to open a pull request I'll approve it, thanks!