I've tried hyper api. And if we use pandas.DataFrame.iterrows() to insert data into hyper file, it's not fast. But if we use hyper sql command "Copy" to create hyper directly from csv, it's much faster, almost 10-100x faster. The only problem is that we have to write data to csv, which is slow with pandas. But luckly we have datatable in Python and it's about R data.table's speed. I tested it on 600M rows and 31 columns data and just spent nearly 17 seconds for build hyper file from csv.
reference: https://github.com/tableau/hyper-api-samples/blob/main/Tableau-Supported/Python/create_hyper_file_from_csv.py.
I've tried hyper api. And if we use pandas.DataFrame.iterrows() to insert data into hyper file, it's not fast. But if we use hyper sql command "Copy" to create hyper directly from csv, it's much faster, almost 10-100x faster. The only problem is that we have to write data to csv, which is slow with pandas. But luckly we have datatable in Python and it's about R data.table's speed. I tested it on 600M rows and 31 columns data and just spent nearly 17 seconds for build hyper file from csv.
reference:
https://github.com/tableau/hyper-api-samples/blob/main/Tableau-Supported/Python/create_hyper_file_from_csv.py.