Open shou123 opened 1 year ago
Skyhook's library name is libarrow_skyhook.so
not libarrow_skyhook_client.so
.
Why do you think that the name is libarrow_skyhook_client.so
.
Cc: @JayjeetAtGithub
Skyhook's library name is
libarrow_skyhook.so
notlibarrow_skyhook_client.so
. Why do you think that the name islibarrow_skyhook_client.so
.Cc: @JayjeetAtGithub
'libarrow_skyhook_client.so' is support to using 'SkyhookFileFormat' API which is link 'arrow_dataset', 'arrow', and 'arrow_skyhook_client' shared libraries during compiling.
Reference: [https://jayjeetc.medium.com/skyhookdm-is-now-a-part-of-apache-arrow-e5d7b9a810ba]
The article was written by @JayjeetAtGithub . So we should wait for a response from @JayjeetAtGithub . :-)
The article was written by @JayjeetAtGithub . So we should wait for a response from @JayjeetAtGithub . :-)
For Sure.
The libarrow_skyhook_client.so library is generated by the arrow_skyhook_client target in the pyarrow/cpp/build/BUILD.gn file. This target is only enabled when ARROW_SKYHOOK=ON is set.
The SkyhookFileFormat API is implemented in the arrow/ipc/skyhook.cc file. This file only includes the libarrow_skyhook_client.so library if it is available.
When ARROW_SKYHOOK=ON is not set, the libarrow_skyhook_client.so library is not generated, and the SkyhookFileFormat API is not available.
The libarrow_skyhook_client.so library is generated by the arrow_skyhook_client target in the pyarrow/cpp/build/BUILD.gn file. This target is only enabled when ARROW_SKYHOOK=ON is set.
Sorry, I didn't find the "pyarrow/cpp/build/BUILD.gn" file in the aparche arrow source code. Could you please help provide a source code link?
The libarrow_skyhook_client.so library is generated by the arrow_skyhook_client target in the pyarrow/cpp/build/BUILD.gn file. This target is only enabled when ARROW_SKYHOOK=ON is set.
The SkyhookFileFormat API is implemented in the arrow/ipc/skyhook.cc file. This file only includes the libarrow_skyhook_client.so library if it is available.
When ARROW_SKYHOOK=ON is not set, the libarrow_skyhook_client.so library is not generated, and the SkyhookFileFormat API is not available.
PS, I also set the 'ARROW_SKYHOOK=ON' and according to the paper: 'https://arxiv.org/pdf/2204.06074.pdf" paper, the pyarrow need to include a function named 'SkyhookFileFormat'. But it is not include this function at 'pyarrow ' library.
How are you installing Arrow today? I think we might not be enabling skyhook in the wheels that we publish to pypi / conda-forge. So you will have to build wheels from source. Directions on how to do this are here: https://arrow.apache.org/docs/developers/python.html
How are you installing Arrow today? I think we might not be enabling skyhook in the wheels that we publish to pypi / conda-forge. So you will have to build wheels from source. Directions on how to do this are here: https://arrow.apache.org/docs/developers/python.html
Thank you for providing the information. I'll try for that.
@shou123 , I know this is quite late, but did you manage to figure out this particular issue?
Based on #37866 being opened, I am assuming so.
Describe the bug, including details regarding any error messages, version, and platform.
According to the paper mention: import pyarrow.dataset as ds format_ = ds.SkyhookFileFormat( "parquet", "/ceph.conf" )
But for the skyhook build, when set up "ARROW_SKYHOOK=ON", there is no "libarrow_skyhook_client.so" library generate. SkyhookFileFormat API can not be used.
Component(s)
Packaging, Python