apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
489 stars 178 forks source link

Improve documentation on Configuration page #1290

Closed Samreay closed 3 weeks ago

Samreay commented 3 weeks ago

Feature Request / Improvement

Hi team! We're currently adopting Iceberg format but struggling to configure everything. For example, we'd like to change the parquet compression code from its default to snappy.

I can see that this is an option in the configuration:

image

But how to actually use the configuration options and apply them seems to be missing from the page. Is there a pyiceberg.configure method, are these options that should be passed into .append, should this be set as an environment variable with some form of translation from write.parquet.compression-codec to PYICEBERG__WRITE_PARQUET__COMPRESSION_CODEC? Later on in the page in the Catalogs section, there are some ways detailed as to how to configure catalogs, but I'm not sure if this applies to writer configuration as well (why would it not be at the top if it applied to everything).

I'll open a PR on this under the assumption that the Catalog configuration section is generally applicable, and to also fix the incorrect description in the write.parquet.page-row-limit.

Fokko commented 3 weeks ago

Later on in the page in the Catalogs section, there are some ways detailed as to how to configure catalogs, but I'm not sure if this applies to writer configuration as well (why would it not be at the top if it applied to everything).

Yes, this is indeed the case. Sorry for the confusion there, and thanks for raising a PR. These PRs are very helpful to the community, since the folks that already use PyIceberg know what to look for, and skip parts of the documentation.

Samreay commented 3 weeks ago

@Fokko I've updated the lint in the PR. Also, what are your thoughts on enabling Github discussions? I have a few questions about best practises and similar but I don't want to clutter Issues with things that aren't actual issues.

Fokko commented 3 weeks ago

Enabling Github Discussions has been brought up once or twice, but it hasn't been yet decided. Mainly because there are several places to discuss things, for example, Slack, GitHub Issues or the ASF mailing list. Feel free to raise Github issues, or start a thread on either the mailinglist or Slack (we have a #python channel).

Samreay commented 3 weeks ago

Thanks @Fokko, I've joined the community slack and made a post to the compaction channel about a different topic, looking forward to getting more engaged with the community. Cheers mate.