dask / hdfs3

A wrapper for libhdfs3 to interact with HDFS from Python
http://hdfs3.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
136 stars 40 forks source link

add crc=True|False parameter to HDFileSystem(...) #164

Open sk1p opened 6 years ago

sk1p commented 6 years ago

Title says it all. I also added instructions for testing on Python 2.7 to the CI README. Test is a bit long-winded, comments/improvements welcome

martindurant commented 6 years ago

I expect this is passing only with your local build of libhdfs3. With the reverted build, does to segfaults seen by some, to change here will have no effect. I am not sure what to do about that, since I don't know how to fix the root issue.

sk1p commented 6 years ago

Yeah, the travis check just passes because the configuration is silently ignored on earlier libhdfs3 versions...

About the root cause of the segfaults, I can't help at the moment. We don't have a cluster for our project yet, so we also don't have any fancy kerberos setup. I guess it would help if someone who is experiencing the segfaults could create a docker container that reproduces the problem (which could later be used to run the hdfs3 tests in a kerberos setting, too)

martindurant commented 6 years ago

The segfault does manifest in https://hub.docker.com/r/mdurant/hadoop/

sk1p commented 6 years ago

Oh nice! If I find the time, I'll have a look. Maybe something somewhere is overwriting memory it shouldn't...

I tried to get hfds3/libhdfs3 running in valgrind, but it crashed on me. Did you ever get something like that running? Or something like AddressSanitizer?

martindurant commented 6 years ago

I prefer staying in python-land for a reason. If I were to do anything low-level beyond a 100-line routine these days, I'd pick up my rust to do it. So, unfortunately, no.