Closed msimmoteit-neozo closed 5 months ago
I'm not completely sure if I understand what you mean... Do you want an option for constructing a QVD file based on a bytes
object? Could you please post some minimal code example of the method/API you want?
Thanks for your quick response. Yeah, that was pretty much my idea. Currently the usage looks like this:
from pyqvd import QvdDataFrame
df = QvdDataFrame.from_qvd('sample.qvd')
print(df.head(5))
But for use cases where a qvd file would not be stored on disk, for example in object storage, it would be convenient not having to write it on disk first:
from pyqvd import QvdDataFrame
from google.cloud.storage import Client
client = Client()
bucket = client.get_bucket(MYBUCKET)
blob = bucket.get_blob(MYFILE)
downloaded_file = blob.download_as_bytes()
df = QvdDataFrame.from_qvd_bytes(downloaded_file)
print(df.head(5))
But I think sometimes it can be unwieldy to interact with bytes
directly (bytes, because the raw data in .qvd files would lead to decode errors for string types). For the specific use case of interacting with object storages there is a library called smart_open that implements Pythons file API on top of object storages.
If PyQvd had an API to read from Python files, it could look like this:
from pyqvd import QvdDataFrame
import smart_open
with smart_open.open("url/to/my/object", "rb") as fin:
df = QvdDataFrame.from_qvd_file(fin)
print(df.head(5))
I think this would be nice and generic as in this case the _read_data
method could go from this:
def _read_data(self):
"""
Reads the data of the QVD file into memory.
"""
with open(self._path, 'rb') as file:
self._buffer = file.read()
to this:
def _read_data(self):
"""
Reads the data of the QVD file into memory.
"""
if (isinstance(file, io.TextIOBase)
or isinstance(file, io.BufferedIOBase)
or isinstance(file, io.RawIOBase)
or isinstance(file, io.IOBase)):
try:
self._buffer = self._file.read()
except UnicodeDecodeError as e:
raise Exception("Supply a raw file access. Use mode \"rb\" instead of mode \"r\"")
I like your request, didn't think about object storages, or other storages than the local file system in general, until now... So if I understand you correctly you suggest that it should be able to pass an I/O stream as alternative to a string path to QvdFileReader
or QvdFileWriter
, right? Sounds like a very useful expansion!
Exactly! Thank you so much for your consideration.
I started working on it and added a feature branch. The commit ccb8604a40b45aa8f0a53b093a499db11f6ee85d add a first version of an extended API that supports reading (from_stream()
) and writing (to_stream()
) binary streams as an alternative to files. I modified your suggestion and limited the supported streams to binary streams (no text-based streams e.g. TextIOBase
are supported). The binary stream must be a subclass of RawIOBase
or BufferedIOBase
.
from pyqvd import QvdDataFrame
some_stream = ...
df = QvdDataFrame.from_stream(some_stream)
...
other_stream = ...
df.to_stream(other_stream)
Feel free to check it out and comment it. If there are no changes or objections I would include the feature in the next minor release.
The requested feature of reading/writing binary streams as an alternative to local disk files is included in the next minor release v1.1.0
.
Thank you so much for this. It works great.
Hi,
I'm watching this library with great interest. I was wondering if it was possible to change the API to allow supplying QVD files via
bytes
objects or via a supplied file handler.