dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.44k stars 10.03k forks source link

Improvements to multipart upload for the streaming scenario #48958

Open davidfowl opened 1 year ago

davidfowl commented 1 year ago

This came from a discussion on https://www.reddit.com/r/dotnet/comments/14dnn5v/comment/jp0iwxk/?utm_source=share&utm_medium=web2x&context=3

When I am forced to deal with large file uploads in ASP.NET, my usual goto is the following packages:

https://github.com/ma1f/uploadstream (abstracting away multipart/form-data yuckiness)

https://github.com/AJMitev/FileTypeChecker (identifying the MIME Type / file Type more accurately via the file header 'magic string' and not trusting the file extension)

The code below is now roughly from memory and my controller would usually look something like this (Swashbuckle or NSwag attribute hints removed). There are some gotchas with Swagger and SwaggerUI for supporting multipart/form-data which I haven't included here since they now be fixed, since it's been a while since I checked.

The first example using Azure Blob Storage:

[HttpPost]
[Route("projects/{projectId:int}/files", Name = "UploadFile")]
[Consumes("multipart/form-data")]
[DisableFormValueModelBinding]
[ValidateMimeMultipartContent]
[RequestSizeLimit(--num-bytes--)]
[RequestFormLimits(MultipartBodyLengthLimit = --num-bytes--)]
public async Task<IActionResult> PostAsync([FromRoute] int projectid)
{
  // see nuget for https://github.com/ma1f/uploadstream
  var model = await this.StreamFiles<MyFileItem>(async file =>
  {
      // never trust the client
      var fileName = file.FileName.GetSafeUniqueUri();

      // wrapper around somthing like this: https://github.com/AJMitev/FileTypeChecker
      var fileType = await ExtractActualFileTypeFromFileHeadersOrTrustTheMimeType(file);

      // stream to blob storage
      var blobClient = _blobContainerClient.GetBlobClient(fileName); 
      var blob = new CloudBlockBlob(blobClient.Uri); 
      await blob.UploadFromStreamAsync(file); 
      await blobClient.SetHttpHeadersAsync(new BlobHttpHeaders { ContentType = fileType }); 
  });

  // do some business logic regarding the file upload
  var result = _someComponent.DoWork(file);

  return Ok(result);
}

And the second using some kind of local disk or locally networked SAN disk:

[HttpPost]
[Route("projects/{projectId:int}/files", Name = "UploadFile")]
[Consumes("multipart/form-data")]
[DisableFormValueModelBinding]
[ValidateMimeMultipartContent]
[RequestSizeLimit(--num-bytes--)]
[RequestFormLimits(MultipartBodyLengthLimit = --num-bytes--)]
public async Task<IActionResult> PostAsync([FromRoute] int projectid)
{
  // see nuget for https://github.com/ma1f/uploadstream
  var model = await this.StreamFiles<MyFileItem>(async file =>
  {
      // never trust the client
      var fileName = file.FileName.GetSafeUniqueUri();

      // wrapper around somthing like this: https://github.com/AJMitev/FileTypeChecker
      var fileType = await ExtractActualFileTypeFromFileHeadersOrTrustTheMimeType(file);

      // save to disk option
      var fileUri = _sanDiskManager.GetPath(fileName);
      await using (var fs = System.IO.File.Create(fileUri, BufferSize))
      {
          await file.OpenReadStream().CopyToAsync(fs);
      }
  });

  // do some business logic regarding the file upload
  var result = _someComponent.DoWork(file);

  return Ok(result);
}

The DisableFormValueModelBinding attribute is custom and turns model binding off to support streaming. I also have another one the restricts the endpoint to multipart/form-data, but I think Consumes does the same.

The code for that:

[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method)]
public class DisableFormValueModelBindingAttribute : Attribute, IResourceFilter
{
  public void OnResourceExecuting(ResourceExecutingContext context)
  {
      var factories = context.ValueProviderFactories;
      factories.RemoveType<FormValueProviderFactory>();
      factories.RemoveType<FormFileValueProviderFactory>();
      factories.RemoveType<JQueryFormValueProviderFactory>();
  }

  public void OnResourceExecuted(ResourceExecutedContext context)
  {
  }
}

I can't think of anything else off the top of my head, but it's about having a simple and easy to use abstraction on top of multipart forms which I think is missing. I really don't want to care about boundaries unless I really have a specific need to do so. I'd also like to see all of the gotchas listed in the docs, since it took a great deal of time hunting around and trying to figure out how to get it to work. The docs could also better explain the temporary buffer to disk for all uploaded files over a certain size and it's impact.

TL;DR:

A nicer simple interface for dealing with the streamed files and binding to a multipart/form-data model definition.

A built in way to switch between IFormFile for small files that can be buffered and IStreamFile for larger files?

Some built in way to improve the clashes between the stream and the model binders

A built in way to get a real file type, based off the file signature: https://en.wikipedia.org/wiki/List_of_file_signatures

Update: Fixed code blocks

ghost commented 1 year ago

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.