Closed h-vetinari closed 2 months ago
If desired, I can test this patch on conda-forge infrastructure (build a patched orc that's published in a separate channel, then run https://github.com/conda-forge/arrow-cpp-feedstock/pull/1086 against that).
Thanks for the fix! Do you have a JIRA account? I can assign this to you.
Would it be possible/acceptable to create a symlink in CI? I believe we could write a test that exercises the branch for CONDA_PREFIX
here similar to
https://github.com/apache/orc/blob/b75dcaa060ce4e9cb5bc4600b00bfb8769efc50b/c%2B%2B/test/TestTimezone.cc#L423
but it would need a working tzdb at the end of $CONDA_PREFIX/share/zoneinfo
.
Ah, just found https://github.com/apache/orc/blob/b75dcaa060ce4e9cb5bc4600b00bfb8769efc50b/cmake_modules/ThirdpartyToolchain.cmake#L340 So I guess it should be there?
~Interestingly, I cannot inspect any CI logs here. Unfolding any section just gives a red Error:
.~
Nevermind, it worked now after 5min...
Ah, just found
So I guess it should be there?
That is only used in WIN32 to make test pass on Windows...
That is only used in WIN32 to make test pass on Windows...
Well, then we could still test that the CONDA_PREFIX mechanics work on windows, no? See my last commit, which tries to do this.
OK, so the test I added passed now:
[ RUN ] TestTimezone.testTzdbFromCondaEnv
[ OK ] TestTimezone.testTzdbFromCondaEnv (0 ms)
but restoring TZDIR
still fails:
unknown file: error: C++ exception with description "Time zone file ������������������������������������������������������������������������������/UTC does not exist.
Since this is copied from testMissingTZDB
, I'm surprised the same error doesn't happen there, because that test is saving tzDirBackup
in exactly the same way - i.e. just a pointer to an environment variable which then gets deleted. It seems to me that the restoration of that environment variable cannot succeed (without a deepcopy before deleting TZDIR
), and so far that test may have only passed due to checking if (tzDirBackup != nullptr)
, but - potentially - never actually restored the environment variable...? 🤔
Tests are passing on windows now! 🥳
This is all green now and should be ready for closer review. :)
Thank you, @h-vetinari . This patch looks good to me if all comments from @wgtmac are fixed.
Thanks for the reviews! I made the requested changes.
I have a last question - I see that there's some sort of caching for the tzdb going on (or at least, there's a test for it): https://github.com/apache/orc/blob/695e0f33609adae0114708c3be756bb7b77f4d79/c%2B%2B/test/TestTimezone.cc#L346-L347
Can we delete that cache somehow to rule out that this is what's causing testTzdbFromCondaEnv
to pass? Or is it enough to choose some timezones that haven't been queried yet to bypass the cache?
I have a last question - I see that there's some sort of caching for the tzdb going on (or at least, there's a test for it):
Can we delete that cache somehow to rule out that this is what's causing
testTzdbFromCondaEnv
to pass?
We can either use a different timezone to avoid caching or add a function to invalidate the cache.
Sorry for the force-pushes; I had to figure out the right dates/times (which is a bit harder because we apparently need to specify them in UTC rather than local time; in the latter it would always be around 2/3am).
OK, I think I'm done now from my side. This tests all the things I could think of as relevant.
Thanks a lot for the review and help here @wgtmac! 🙏
Would you consider including this in the 2.0.1 release? I'm probably going to backport this in conda-forge if not, but it would be nicer if it could be included in the release here directly; seeing that it's just about checking a single environment variable, I think the risk here is very low.
Yes, I have already ported it to branch-2.0 and will be released in 2.0.1.
Amazing, thanks a lot! 🤩
What changes were proposed in this pull request?
Find tzdb without having to set
TZDIR
when in a conda-environment (wheretzdata
has a uniform location of$CONDA_PREFIX/share/zoneinfo
across all platforms).Why are the changes needed?
This is due to issues in arrow (see https://github.com/apache/arrow/issues/36026) that cannot really be fixed there, as it assumes that orc >=2.0 knows how to find the tzdb. Having to set
TZDIR
in all user environments is an intrusive change that should be avoided, and since the cost here is checking a single environment variable, it's hopefully not too onerous for consideration.How was this patch tested?
CI here
Was this patch authored or co-authored using generative AI tooling?
No
CC @wgtmac
See also: #1587