Closed djouallah closed 3 weeks ago
Hi @djouallah! Are you encountering any issues when reading 100 of these tables?
Note that even calling catalog.load_table
might be slow if you're running this on a ton of tables, but that's not something Daft can fix because it's just how Iceberg/PyIceberg works unfortunately 😕 There is definitely going to be some fixed overhead of reading and parsing each table's metadata.
sorry, what I meant, I don't want to write 100 LOC in order to read 100 tables, my ask is about a simpler API, not about performance
Ah got it :)
I think if you're reading N
number of tables, you should just store it in a dictionary then. Something like this:
dataframes = {
table_name: daft.read_iceberg(catalog.load_table(table_name)) for table_name in catalog.list_tables("db")
}
Note that these calls to catalog.load_table
and daft.read_iceberg
do take some time to run! If you don't need all 100 tables it would be a good idea not to call them on all the tables 😛
currently using this for three tables, but probably it should be a better way when we have like 100 of tables
something like