InputStream inherits from BufferedIOBase which defines a read1 method to return an arbitrary amount of bytes instead of everything till EOF. libraries like pandas provide functions to read from a io stream and they seem to be calling read1 method internally on the stream. They throw an error when InputStream is directly passed
Reproducible Example
import pandas as pd
from azure.functions.blob import InputStream
def error():
i = InputStream(data=b'a,b,c,d\n1,2,3,4')
# This throws read1() UnsupportedOperation exception
df = pd.read_csv(i, sep=",")
def hack():
i = InputStream(data=b'a,b,c,d\n1,2,3,4')
def read1(self, size: int = -1) -> bytes:
return self.read(size)
setattr(InputStream, 'read1', read1)
# This works because we hacked read1 method into InputStream
with pd.read_csv(i, sep=",", chunksize=1) as reader:
for chunk in reader:
print(chunk)
if __name__ == "__main__":
hack()
error()
versions:
python=3.7
pandas=1.2.4
Use cases
being able to read and process buffered streams of CSV-like files in pandas can be more memory efficient?
makes InputStream more compatible with other libraries reading IOStreams
References
someone had implemented read1 in ABC of the python-worker repo in this PR. seems like details are lost in a forced push (This might be unrelated, not sure)
Description
InputStream
inherits fromBufferedIOBase
which defines aread1
method to return an arbitrary amount of bytes instead of everything till EOF. libraries likepandas
provide functions to read from a io stream and they seem to be callingread1
method internally on the stream. They throw an error when InputStream is directly passedReproducible Example
versions: python=
3.7
pandas=1.2.4
Use cases
InputStream
more compatible with other libraries reading IOStreamsReferences
someone had implemented read1 in ABC of the python-worker repo in this PR. seems like details are lost in a forced push (This might be unrelated, not sure)