cesanta / mongoose

Embedded Web Server
https://mongoose.ws
Other
11.01k stars 2.71k forks source link

Add the ability to force/override mime content type. #2871

Closed Keith-Cancel closed 4 weeks ago

Keith-Cancel commented 1 month ago

I needed a way to specify the mime type other than relying file extension for mg_http_serve_file(). I am inspecting magic numbers and headers of files before serving them. For example a file with .jpg file extension but really is a png when looking at the header and magic numbers. So originally I would just set opts.mime_types to like jpg=image/png in this case. However, not every file has an extension so this will not always work.

This uses \ since it is a common escape character. I use it to indicate it is not a list of file extensions and content types, but instead is a mime_type that will be used instead of guessing based off file extension. Also \ is unlikely to be a legal character for a file name on pretty much most filesystems so it would be extremely unlikely to be the start of a file extension so should not break the current use case of only being a list of extensions.

Then end result is if we go back to my simple example one can just set opts.mime_types to \image/png.

I did think about maybe adding a field to struct mg_http_serve_opts, however after looking at how much it used in the code and being a part of the API. I figured something that this was less invasive would probably be the best approach. Although, I am all ears to any better suggestion.

I also did submit a CLA at https://cesanta.com/cla.html.

cpq commented 1 month ago

@Keith-Cancel thank you. Are you aware that opt.mime_types accepts a comma separated list of entries?

Keith-Cancel commented 4 weeks ago

@cpq Yes, I am aware of that and that functionality still works. This just just checks if the string is what I call an override/force string and if it is not one proceeds as normal treating it as a list of file extensions and mime types.

As I mentioned I did this initially for files that have no file extension since that list is based off the file extension. Further a nice benefit of this change it meant in the case where the file extension mismatched the true file type I don't have to do additional processing of the file name to extract the extension then check/use the wrong extension. For example a png with a jpg file extension. I can easily just say hey this a png (via \image/png) without first getting the extension and then formatting that info as jpg=image/png. However, the most important issue is handling files with no file extension. For example a png file that has no file extension ???=image/png there obviously is no extension one could put in ??? since it does not have one.

If you have some better ideas one how you want to handle this like adding a field to struct mg_http_serve_opts or something else I could easily do that instead. I also did think about something like *={some_mime_type}, However, that might imply you could do something like *the*={some_mine_type} which would make things more complicated than a simple string comparison.

cpq commented 4 weeks ago

User overrides are processed before built-in mime types, so they have precedence, therefore for files with extension, mg_http_serve_opts shoult work.

The only issue could be with files with no extension. Mongoose's code finds extensions this way: https://github.com/cesanta/mongoose/blob/44b3d60692b928c406bde040218ec128b33efd41/src/http.c#L527-L530

For files with no extension, the whole file path is regarded as extension. Please try that.

We may add an asterisk * as a wildcard that match anything.

Keith-Cancel commented 4 weeks ago

The only issue could be with files with no extension. Mongoose's code finds extensions this way:

https://github.com/cesanta/mongoose/blob/44b3d60692b928c406bde040218ec128b33efd41/src/http.c#L527-L530

For files with no extension, the whole file path is regarded as extension. Please try that.

That does work, I should have realized that when looking over the code. However, I would say using the whole file path as basically as an extension is not exactly ideal.

We may add an asterisk * as a wildcard that match anything.

I just revised this pull request to do just that instead of \ as an escape. Now if a user for instance puts this *=application/octet-stream, ...{other extensions} in the list if and there is no exact file extension match in the list the mime-type application/octet-stream will be used in this example. An other example for instance a user has a directory of what they know are all jpegs they could use this string for example jpg=image/jpeg,*=image/jpeg if some reason a file in that directory had no extension or the extension was jpeg instead of jpg the wild card would be used.

It also allows the user to specify something other than text/plain; charset=utf-8 in the instance of no matches since that is what the guess_content_type() does if nothing matches in the user provided list and the default list. Although as written it does take precedence over the default list which seems fine to me. Since if nothing matches in the user's list we probably want to use the wild card anyways.