apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
489 stars 178 forks source link

Replace reference of `Table.identifier` with `Table.name` #1346

Closed kevinjqliu closed 1 week ago

kevinjqliu commented 1 week ago

Closes #1336, related to #1327 This PR changes the implementation of the Table.name function to use self._identifier instead of self.identifier to avoid having unnecessary deprecation warnings. This PR also replaces all references of Table.identifier with Table.name() to prepare for the deprecation of Table.identifier and avoid emitting warnings in tests. Using VSCode, I confirmed all use of Table.identifier is replaced in the repo.

sungwy commented 1 week ago

Hi @kevinjqliu thanks for bouncing on this! And nice catch on the commit_table functions... this looks good to me. It looks like python3.12 CI is failing due to a DeprecationWarning on utcfromtimestamp. Could we rebase this branch and run the CI again?

Related fix: https://github.com/apache/iceberg-python/pull/1134/files

kevinjqliu commented 1 week ago

super weird that this is only happening to 3.12

self = TimestampNTZType(), ts = 1672601100000000

    def fromInternal(self, ts: int) -> datetime.datetime:
        if ts is not None:
            # using int to avoid precision loss in float
>           return datetime.datetime.utcfromtimestamp(ts // 1000000).replace(
                microsecond=ts % 1000000
            )
E           DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).

which corresponds to pyspark's use of datetime.datetime.utcfromtimestamp, which was recently changed https://github.com/apache/spark/commit/8e02a6493ef5dc5949e161179a7c081c5ca58ff2#diff-bb02965370d515ec7add7967bc949c18c88ffc7346cceaebefb84907dfdce903L437

We remove the warning in #1134, but the tests passed then

kevinjqliu commented 1 week ago

datetime.datetime.utcfromtimestamp is deprecated starting from 3.12

https://docs.python.org/3.12/library/datetime.html#datetime.datetime.utcfromtimestamp

Fokko commented 1 week ago

@kevinjqliu Yes a newer version of Python will emit a deprecation warning, I'm okay with putting back the exclusion to fix that in another PR

kevinjqliu commented 1 week ago

opened #1349 and added back the warning filter, we can remove once we upgrade PySpark to 4.0