SBentley / qvd-utils

Read Qlik Sense .qvd files
https://pypi.org/project/qvd/
Apache License 2.0
49 stars 18 forks source link

Implement reading qvd files from Python IO #27

Open msimmoteit-neozo opened 5 months ago

msimmoteit-neozo commented 5 months ago

Hi, I'm a big fan of this library. It has really helped me to work with QVD files. Thank you so much.

Currently this library only works when reading in qvd files via a file name, but I have a use case, where the data I want to read in is not available in to my Python interpreter in that format. I would like to read in qvd files from either in-memory bytes from Python or from Python File objects:

As a starting point I implemented this myself. Now, with these changes, qvd files can be read in the current way:

from qvd import qvd_reader
qvd_reader.read("qvd/test_files/AAPL.qvd")

But also via

with open("qvd/test_files/AAPL.qvd", "rb") as fin:
    qvd_reader.read(fin)

I also added an error message for a common mistake one could make:

>>> with open("qvd/test_files/AAPL.qvd", "r") as fin:
...     qvd_reader.read(fin)
...
Traceback (most recent call last):
  File ".../qvd-utils/qvd/qvd_reader.py", line 18, in read_to_dict
    unpacked_data = file.read()
                    ^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 5816: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/.../qvd-utils/qvd/qvd_reader.py", line 7, in read
    data_dict = read_to_dict(file)
                ^^^^^^^^^^^^^^^^^^
  File "/.../qvd-utils/qvd/qvd_reader.py", line 20, in read_to_dict
    raise Exception("Supply a raw file access. Use mode \"rb\" instead of mode \"r\"")
Exception: Supply a raw file access. Use mode "rb" instead of mode "r"

There is room for improvement here, as there is code duplication between the function read_qvd and read_qvd_from_buffer. I tried to unify it, but the BufRead of the Cursor over the Vec<u8> and the BufRead on the file behaved differently, so I let it like it is.

Maybe someone else finds a way to make it better?