Closed Samreay closed 3 weeks ago
Later on in the page in the Catalogs section, there are some ways detailed as to how to configure catalogs, but I'm not sure if this applies to writer configuration as well (why would it not be at the top if it applied to everything).
Yes, this is indeed the case. Sorry for the confusion there, and thanks for raising a PR. These PRs are very helpful to the community, since the folks that already use PyIceberg know what to look for, and skip parts of the documentation.
@Fokko I've updated the lint in the PR. Also, what are your thoughts on enabling Github discussions? I have a few questions about best practises and similar but I don't want to clutter Issues with things that aren't actual issues.
Enabling Github Discussions has been brought up once or twice, but it hasn't been yet decided. Mainly because there are several places to discuss things, for example, Slack, GitHub Issues or the ASF mailing list. Feel free to raise Github issues, or start a thread on either the mailinglist or Slack (we have a #python channel).
Thanks @Fokko, I've joined the community slack and made a post to the compaction channel about a different topic, looking forward to getting more engaged with the community. Cheers mate.
Feature Request / Improvement
Hi team! We're currently adopting Iceberg format but struggling to configure everything. For example, we'd like to change the parquet compression code from its default to snappy.
I can see that this is an option in the configuration:
But how to actually use the configuration options and apply them seems to be missing from the page. Is there a
pyiceberg.configure
method, are these options that should be passed into.append
, should this be set as an environment variable with some form of translation fromwrite.parquet.compression-codec
toPYICEBERG__WRITE_PARQUET__COMPRESSION_CODEC
? Later on in the page in the Catalogs section, there are some ways detailed as to how to configure catalogs, but I'm not sure if this applies to writer configuration as well (why would it not be at the top if it applied to everything).I'll open a PR on this under the assumption that the Catalog configuration section is generally applicable, and to also fix the incorrect description in the
write.parquet.page-row-limit
.