dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.73k stars 3.18k forks source link

Support streaming for VARBINARY columns #6234

Open chrischu opened 8 years ago

chrischu commented 8 years ago

It would be great if it was possible to represent VARBINARY(MAX) columns with a property of type Stream instead of byte[].

For small files it might not make a huge difference, but for large files it is non-optimal that I have to load the whole file into memory before being able to save it into the database.

Currently the only way to workaround this problem is to use ADO.NET directly to save my file in the database (e.g. like in this blogpost: http://www.syntaxwarriors.com/2013/stream-varbinary-data-to-and-from-mssql-using-csharp/). It would definitely be better if EntityFramework gave me the possibility to achieve something similar without having to involve ADO.NET.

divega commented 8 years ago

Triage: we believe that this would be a good thing to enable, so adding to the backlog.

Notes:

roji commented 8 years ago

Note that this makes sense from a PostgreSQL perspective as well - Npgsql supports NpgsqlDataReader.GetStream() on PostgreSQL bytea columns. Note that it if this is done, it's probably relevant to do the same for DbDataReader.GetTextReader() for text columns.

bricelam commented 8 years ago

Related: aspnet/Microsoft.Data.Sqlite#18

mloefler commented 6 years ago

Any news on that?

ajcvickers commented 6 years ago

@mloefler This issue is in the Backlog milestone. This means that it is not going to happen for the 2.1 release. We will re-assess the backlog following the 2.1 release and consider this item at that time. However, keep in mind that there are many other high priority features with which it will be competing for resources.

BalassaMarton commented 6 years ago

@divega

Reconnect to the database on demand at a later point.

Are you sure this is a requirement? Is it easier to implement if it's not? It would make a life saving feature even if the streaming has to be done within the life cycle of the original DbContext.

da3dsoul commented 3 years ago

It's been a good while. Any news on this? My case is simply image loading, so a byte[] isn't terrible to deal with, but .net's GC throws a fit with huge memory bandwidth on large objects through efcore.

roji commented 3 years ago

@da3dsoul see https://github.com/dotnet/efcore/issues/6234#issuecomment-376702296.

Adding needs-design because it's not clear how stream mapping would work with updating, if at all.

da3dsoul commented 3 years ago

Personally, I'd be happy with read-only access within the scope of the DbContext. I understand that partially implemented features with unfinalized technical designs will create technical debt that will be hard to maintain compatibility in the future. That's a bad thing to have in a language framework cough python. Thanks for the reply.

roji commented 3 years ago

Yeah, read-only is an option, though some way of updating should be available (possibly a byte[] property alongside the Stream, for that purpose only).

da3dsoul commented 3 years ago

Another option is that the output stream is an implementation of a memory stream that flushes to a simple byte[]. It would maintain possibility of better implementation in the future, and it'll feel like less of a hack to use

bachratyg commented 2 years ago

I'm not convinced this is generally possible (e.g. materializing a bunch of entities each with a Stream property) without the database supporting some kind of stream handle that can be used to call back to the db. Then there's the problem of the handle lifecycle vs the connection lifecycle.

However if a forward-only read stream is all that's needed then this is possible today:

var q = from d in dbContext.Files
             // compose any query you like, add filters, slice and dice the binary content, etc
             select new { FileName = ..., Content = ... };

await dbContext.Database.OpenConnectionAsync();
using var command = q.CreateDbCommand(); // been waiting for this for ages :D
using var reader = await command.ExecuteReaderAsync(CommandBehavior.SequentialAccess); // sequential is important
if( await reader.ReadAsync() )
{
    var fileName = reader.GetString(0);
    using var stream = reader.GetStream(1);
    // Read the stream here from start to end
}

With a little extra boilerplate (a custom stream wrapper that handles seeking and disposes the reader/command, omitted for brevity) this can be used to serve e.g. partial file content through web api without significant buffering.

bachratyg commented 2 years ago

Just occured to me: in the code snippet above I assume the shape of the result generated from the query. Experience tells that when there are no entities (includes), no client-eval in the final projection and no split-query magic then EFC matches both the order and the aliases of the result columns to the shape of the result object (this was very much not the case in EF6). Is this an intended and reliable behavior or just an implementation detail that should not be relied on?

roji commented 2 years ago

@bachratyg I wouldn't rely on the ordering - you can use reader.GetOrdinal to get the index for a given name. But I think that if you project out an anonymous type with explicitly-named fields, it should be safe to rely on that.

bachratyg commented 2 years ago

I don't think GetOrdinal makes much of a difference. It may help with the order of metadata columns but since all data has to be read in sequential order it's essential that the stream column is the very last. E.g. for a http file response the content type/disposition/last-modified/etag/etc header must be already set (and therefore read) when the response body starts streaming. Not a nice solution but custom serialized metadata can be prepended to the stream, this way there's only one column.

roji commented 2 years ago

@bachratyg the point is that it's better not to assume that EF will generate projections in SQL in the same order as the anonymous type projection. For example, the above code has select new { FileName = ..., Content = ... }, but in the DbDataReader Content may come out first. You indeed need to consume the reader fields sequentially, but you need to be able to know which field is which.

Neo-vortex commented 1 year ago

Ok, it is 2023 and this is still open! 7 years! well done team

roji commented 1 year ago

@Neo-vortex we do not implement issues by how long they're open in our backlog, but generally more by how many votes they have rathered; this issue has gathered only 36 votes in the 7 years it has been open, so very few users actually need it. It's also not a trivial issue to implement, which also factors into the decision.