Closed bridgream closed 3 years ago
?
Thanks for working on this. I haven't looked into your code but I have put it in my to-do list at Saturday if you are still interested in it.
@trivialfis thank you for your reply! I've re-opened the pull request and make parquet support optional (disabled by default and should not affect users that do not need this feature). Would you please move to that PR?
Yup, also @hcho3
I've added support for Parquet files to dmlc-core. I did this to enable external memory support for Parquet files in XGBoost. I have tested my code under the XGBoost framework by training two models with identical parameters but using CSV and Parquet files. The two models generate identical predictions on the same test data. (unit test code not included in this pull request)
However, my implementation depends on Apache Arrow Parquet. Although I plan to make Parquet support optional, I am aware that the parsers are registered in src/data.cc. As the register is static, I don't know how to add the parser into the register optionally without affecting existing code. Can anybody give any advice?
Thanks in advance!
@PeterPanOnGit @trivialfis