davidfowl / BedrockFramework

High performance, low level networking APIs for building custom servers and clients.
MIT License
1.04k stars 153 forks source link

Implement HTTP/1.x IMessageReader and IMessageWriter #41

Open davidfowl opened 4 years ago

davidfowl commented 4 years ago

We should have a well tested blazing fast implementation of the various HTTP/1.x parsers:

Server

Client

Shared

Each of these can be implemented and tested separately (for the most part) and some of them have already been prototyped in the experimental project.

YairHalberstadt commented 4 years ago

some of them have already been prototyped in the experimental project.

I think it would be useful to have a description of what these are lacking before they can be moved into the main project.

How familiar would you have to be with the HTTP spec to work on these?

Decide on the representation for an HTTP request and body on the server side Decide on the representation for an HTTP request and body on the client side.

At the moment you're using System.Net.Http. Is there anything wrong with this?

davidfowl commented 4 years ago

I think it would be useful to have a description of what these are lacking before they can be moved into the main project.

The body parsers that are prototyped should be easy to make progress on without too much fuss. The easy ones are IMessageReader<ReadOnlySequence<byte>> and IMessageWriter<ReadOnlySequence<byte>> (maybe another form of byte as well).

At the moment you're using System.Net.Http. Is there anything wrong with this?

Yes I'm not happy with this, I did this because I was being lazy. I'd much rather have a lower level implementation that either didn't enforce a specific shape (callback based) or a representation offers access to a header at a time (which seems a little inefficient).

I think this will require some experimentation to get it right. I don't have the right answers as yet.

How familiar would you have to be with the HTTP spec to work on these?

These parsers aren't that difficult to write, there are a few implementations in ASP.NET Core (for readers).

It's also worth discussing header parsing separately as it's used in the request parser, the response parser and the multipart parser. It's possible we just need a header representation and not one for the entire request/response.

YairHalberstadt commented 4 years ago

Ok. I'm going to start off real simple here.

Firstly I agree System.Net is not a great representation - it's very heavy, and very opinionated. Also it's quite likely we'll want to keep everything as bytes until as late as possible. For example, it's likely you'll want to use UTF8 rather than strings in some applications.

As such, I think I'll start off with the simplest possible component that will be unarguable: An HttpHeaderReader returning a

ref struct HttpHeader
{
    public HttpHeader(ReadOnlySpan<byte> fieldName, ReadOnlySpan<byte> fieldValue)
    { 
        FieldName = fieldName;
        FieldValue = fieldValue;
    }
    public readonly ReadOnlySpan<byte> FieldName { get; }
    public readonly ReadOnlySpan<byte> FieldValue { get; }
}

In the long term I'm thinking of something like this:

public interface IHttpRequestMessageBuilder
{
     public void AddHeader(HttpHeader header);
     public void AddVersion(Version version);
     ...
}

public HttpRequestMessageReader<TBuilder> : IMessageReader<TBuilder> where TBuilder : IHttpRequestMessageBuilder
{
    public HttpRequestMessageReader(Func<TBuilder> createNewBuilder) => _createNewBuilder = createNewBuilder;
}

I think that would be quite easy to adapt to most representations of HttpRequests.

Long term we'll probably also want to provide helpers/parsers to aid with Well Known Http headers, but I think that's not necessary for the POC.

Agreed?

YairHalberstadt commented 4 years ago

Also, do we want to support streaming http? Because IMessageReader isn't a suitable abstraction for that at the top level - i.e. it requires the entire bytes for the message is loaded into memory at once, which is fine for parsing request headers, but not ideal for parsing the entire http request.

Perhaps we ought to design an API suitable for streaming first, and then implement the non-streaming one on top of that?

davidfowl commented 4 years ago

As such, I think I'll start off with the simplest possible component that will be unarguable: An HttpHeaderReader returning a

That's basically the callback approach though I don't know if it makes sense returning the IHttpRequestMessageBuilder. In this case, the IMessageReader isn't really returning anything (other than the IHttpRequestMessageBuilder that was passed in). Not sure if it needs to be a Func<IHttpRequestMessageBuilder> either. You've designed what we did in Kestrel 😄. Here are the interfaces in ASP.NET Core:

https://github.com/dotnet/aspnetcore/blob/a49e084c1515dfaa16de8d7d4b0c00190587ad87/src/Servers/Kestrel/Core/src/Internal/Http/IHttpParser.cs#L10

https://github.com/dotnet/aspnetcore/blob/a49e084c1515dfaa16de8d7d4b0c00190587ad87/src/Servers/Kestrel/Core/src/Internal/Http/IHttpRequestLineHandler.cs#L8

https://github.com/dotnet/aspnetcore/blob/a49e084c1515dfaa16de8d7d4b0c00190587ad87/src/Shared/Http2/IHttpHeadersHandler.cs#L18

I think it's proven and a solid approach.

The other option would be the have the parser return a single struct (not ref struct) that returns the startline and each header value one at a time. It avoids having to allocate a dictionary, or invert the model to be a callback. Something like:

public readonly struct HttpRequestStartLine
{
  public ReadOnlySpan<byte> Method { get; }
  public ReadOnlySpan<byte> Path { get; }
  public ReadOnlySpan<byte> Version { get; }
}

public readonly struct HttpHeader
{
    public ReadOnlySpan<byte> FieldName { get; }
    public ReadOnlySpan<byte> FieldValue { get; }
}

public class HttpRequestStartLineParser : IMessageReader<HttpRequestStartline> { }
public class HttpHeaderReader: IMessageReader<HttpHeader> { }

For example, it's likely you'll want to use UTF8 rather than strings in some applications.

Yes, this is interesting. @GrabYourPitchForks, this is a real use case for UTF8 string.

Long term we'll probably also want to provide helpers/parsers to aid with Well Known Http headers, but I think that's not necessary for the POC.

Yep! Agreed. Should add that in another issue actually. I forgot about those.

Agreed?

I'd like to understand the merits of the other approach but I'm pretty sold on the callback approach as well.

davidfowl commented 4 years ago

Also, do we want to support streaming http? Because IMessageReader isn't a suitable abstraction for that at the top level - i.e. it requires the entire bytes for the message is loaded into memory at once, which is fine for parsing request headers, but not ideal for parsing the entire http request.

Not sure what you mean here. If you're talking about the body then it works for that too. Your last pull request was basically the model for how that would work.

For example, here's the HttpClient implementation:

Startline + headers https://github.com/davidfowl/BedrockFramework/blob/fe439694c193db13708235ce654f3a51f9b7962b/src/Bedrock.Framework.Experimental/Protocols/HttpClientProtocol.cs#L43-L55

Body:

https://github.com/davidfowl/BedrockFramework/blob/fe439694c193db13708235ce654f3a51f9b7962b/src/Bedrock.Framework.Experimental/Protocols/HttpClientProtocol.cs#L55-L65

Your last PR actually was the last step in making this work well.

YairHalberstadt commented 4 years ago

You've designed what we did in Kestrel 😄. Here are the interfaces in ASP.NET Core:

What is the reason we're not using this, and writing our own instead? I'd just like to get an idea for the aims before I start writing something that isn't fit for purpose...

YairHalberstadt commented 4 years ago
public readonly struct HttpHeader
{
    public ReadOnlySpan<byte> FieldName { get; }
    public ReadOnlySpan<byte> FieldValue { get; }
}

That won't compile

davidfowl commented 4 years ago

That won't compile

Yea I know it wouldn't compile. I wasn't trying to make it compile (the data would be stored as a ReadOnlyMemory in the header and exposed as a span.

public readonly struct HttpHeader
{
    private readonly ReadOnlyMemory<byte> _name;
    private readonly ReadOnlyMemory<byte> _value;

    public HttpHeader(ReadOnlyMemory<byte> name, ReadOnlyMemory<byte> value)
    {
        _name = name;
        _value = value;
    }

    public ReadOnlySpan<byte> Name => _name.Span;
    public ReadOnlySpan<byte> Value => _value.Span;
}

What is the reason we're not using this, and writing our own instead? I'd just like to get an idea for the aims before I start writing something that isn't fit for purpose...

It's internal/pubternal unsupported.

YairHalberstadt commented 4 years ago

It's internal/pubternal unsupported.

Any reason not to copy it and then develop as necessary though?

davidfowl commented 4 years ago

@YairHalberstadt Nope, it's a fine starting point. It needs to be shaped into the IMessageParser format is all.

YairHalberstadt commented 4 years ago

IMessageParser

IMessageReader?

Thanks - that's enough information to get started for now!

YairHalberstadt commented 4 years ago

The kestrel HttpParser doesn't seem to support line folding in headers.

https://github.com/dotnet/aspnetcore/issues/13923

I presume we don't need to either?

Also do we want to guarantee we'll throw on invalid data, or is it sufficient to always accept and correctly parse valid data?

The first will cause significant performance hits compared to the second...

davidfowl commented 4 years ago

I presume we don't need to either?

Start without it.

Also do we want to guarantee we'll throw on invalid data, or is it sufficient to always accept and correctly parse valid data?

We could return a failure as part of the parse result. Actually that might be even better (maybe a ParseResult<T>).

PS: I added a gitter room https://gitter.im/BedrockFramework/BedrockFramework

YairHalberstadt commented 4 years ago

@davidfowl I think the next step with Http1 is to create an Http1HeadersParser which parses all the headers in a message, and can be shared between the Request and the Response parser. If we want to maximize performance, it should use callbacks to avoid allocating a collection for the headers. What I'm not sure about is the best way to do error handling via such a model.

One option is to return a union indicating success, failure, or need more data, separately to using the callback. An alternative is to have an OnFailure method on the callback. Alternatively we could simply use exceptions, seeing as errors should be a very uncommon case in real life usage, and there isn't really a good way to recover from them