Open bryancole opened 8 months ago
Cursor.close()
, or implicitly when: a) new statement is executed by cursor, b) cursor is used in context manager at the end of its code block, c) transaction end (commit/rollback), or d) connection is closedBlobReader.is_text()
method to determine this. However, this method is valid only for TEXT/BINARY BLOB su-btypes. There are other sub-types (including user ones) that could be either text or binary. In such case you need to know these sub-types in your application and use the proper read method according to BlobReader.sub_type
value.__bytes__
method to BlobReader, I will not do that. BLOB values could be quite big (even gigabytes) and reading the whole value into memory is a perfect method how to shoot itself in the foot. It also beats the whole idea of BlobReader, that is there to avoid such disasters (there is also a safety threshold, see Cursor.stream_blob_threshold
in documentation). Anyway, bytes()
will work only for binary BLOBs, but not for text ones. They are returned as Python str
, not bytes
(see driver documentation for automatic string conversions and character set handling).Note that iteration over BlobReader uses readline, so it works only with text blobs. I considered using read() for binary BLOBs, but rejected the idea, as there is no easy way how to determine the amount of data to be returned. Of course, I could define a BlobReader attribute (for example fetch_size
) with default 1
that could be used for that, but it will only clutter the BlobReader instances and does not provide much convenience and code readability over direct use of read()
in application. But I'm open to discus that.
Also note, that BLOB value does not need to be returned as BlobReader. Normally they are returned as materialized values (either str
or bytes
according to BLOB sub-type). BlobReader is returned only when explicitly requested via Cursor.stream_blobs
list or when size of value to be returned exceeds Cursor.stream_blob_threshold
(default is 65536 bytes, i.e. 64K). Application needs to be prepared for that. Also note that stream_blob_threshold
value could be configured at driver level, see DriverConfig.stream_blob_threshold
configuration option.
Thanks for the feedback.
Of course you are right, sqlalchemy is closing the DB-API cursor immediately after iterating over it, and before accessing the data in the row-object. Sadly, there does not seem to be any way to avoid this behaviour in the sqlalchemy core except by modifying the sqlalchemy code directly. I will take this question to the sqlalchemy issue-list.
Ultimately, the easiest solution for my problem of reading large-ish BLOBs using sqlalchemy is solved if I set the stream_blob_threshold
to a very large value. For BLOBs smaller than stream_blob_threshold
the Cursor.stream_blobs
overrides the default materialised blob behaviour (as you point out). I think it would be similarly useful to have an override to force large blobs (> stream_blob_threshold
) to be materialised. There is a clear use case for making large blobs materialised, in cases where the calling library is expecting the return type for the data to be a simple type. E.g. sqlalchemy doesn't know anything about the existence of BlobReader objects, so how can it know that it must not close the cursor before accessing the returned object?
I am surprised to discover that it is not possible to pass a python file-object to the bytes
constructor, to read the a binary file into a bytes-object. I really expected this to work. I was wrong. Since the BlobReader is expected to behave the same as a file-object, I think I agree that adding a __bytes__
method would be adding non-file-like behaviour. However, it is possible to iterate over a binary file-object. This will read the entire file on the first call to __next__()
, then iteration will end. By contrast, the BlobReader raises an exception: firebird.driver.types.InterfaceError: Can't read line from binary BLOB
. My opinion is that iteration over a binary stream should return chunks. The chunk size could default to the system PAGESIZE, for example. The chunk size is rarely important except to be not-too-small and not-too-big.
I'm trying to use the new v1.10 driver with SqlAlchemy v1.4. I'm trying to load a BLOB type from my database. The problem is that SqlAlchemy iterates over the connection Cursor to obtain all rows, and then passes these rows to a "processor" object to do type-conversions. The DB cursor returns BlobReader objects for each BLOB column. The BlobReader objects are file-like objects that are read later in the processor object, after the cursor iteration has finished. This used to work OK with the old FDB driver. The problem with the new driver is that the cursor object closes all BlobReaders objects after the initial iteration over the cursor. Thus, any later attempt to read the blob data fails.
I can get things to work if I remove the lines in the Cursor._clear() method (see https://github.com/FirebirdSQL/python3-driver/blob/1aaa016468e7ef3eb72616340c3682a706c84b36/src/firebird/driver/core.py#L3642-L3643 ).
I can't figure out what the "correct" DB API compliant behaviour should be as the DB-API doesn't describe any special handling for BLOBs or BlobReader objects. I don't really see that it should be necessary to close the BlobReaders after iteration over the cursor. It's all still within the context of the transaction.
A further related enhancement would be to add a
__bytes__(self)
method to the BlobReader object. Since the basic DB-API assumes the database will return simple types, a typical way to handle a BLOB would be to pass the value to the pythonbytes
constructor. This will try to iterate over the BlobReader using repeated calls toBlobReader.readline()
. This also fails. Even if this succeeded, this is an inefficient way to convert a BLOB to bytes (unless it really is text split into lines). Better to just callBlobReader.read()
and get the entire thing as a bytes object. This is easily implementing in a__bytes__()
method.I could submit patches for these but I'm holding off to see what you think.