Closed gglanzani closed 7 years ago
Whilst I don't see any reason not to implement these, I am surprised that you find __iter__
is not enough. HDFiles are routinely used in conjunction with pandas - what kind of problem were you having?
Furthermore, it seems to me like next could more simply call readline()
; could you add some tests, please, to ensure that the functionality works as required?
pandas check for next
explicitly, so it fails when you do
with hdfs.open(path_to_csv) as f:
df = pd.read_csv(f)
I will look into readline and tests!
Thanks for the comment.
I've updated next
and __next__
for now (@pitrou: thanks for the tip, I was sure there was an easier way).
@martindurant As for wrapping in TextIOWrapper
: do you have some pointers on the path to take?
On the side: can this be merged in the meantime? It would be very helpful to have this feature.
The wrapper should work like
import io
with hdfs.open(path_to_csv) as f:
df = pd.read_csv(io.TextIOWrapper(f))
where Pandas would now see a text-mode file with buffering and correct line-end handling.
I would merge, but there ought to be some test of the new method(s). I notice, also, that _genline
is now essentially repeated, so could refactor - but this it not important.
@martindurant I've wrote an additional test.
BTW, pandas is using readline()
to read f
:)
Cool, thank you.
This allows for libraries as pandas to read a file as a buffer.