Woody88 / purescript-wai

Purescript Web Application Interface. Port of the Haskell WAI library.
MIT License
20 stars 2 forks source link

Make HttpRequest not be bound to Node's HTTP Request #7

Open jvliwanag opened 4 years ago

jvliwanag commented 4 years ago

Hi @Woody88 !

I'm building backends that utilize AWS api gateway and Lambda. Basically that simply involves creating the ff:

newtype ApiGwHandler
  = ApiGwHandler (EffectFn2 ApiGwReq LambdaContext (Promise ApiGwResp))

Note that ApiGwReq, LambdaContext and ApiGwResp are all just records broadly described here:

https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api-develop-integrations-lambda.html

I'd love to be be able to use purescript WAI to get a bit more generalization. Current problem is WAI.HttpRequest is defined to newtype node's http request.

If it were a regular record, it would be simple for me to write a simple ApiGwReq -> WAI.HttpRequest and make this usable on AWS lambda.

On the Response side, it's going to be a bit of a shame since from what I can tell, on AWS API GW/Lambda, I need to output a fixed string (possibly base 64 encoded) for the body. So I'll simply evaluate all the Response type to that.

As an aside, we've been deploying on AWS lambda/api gateway while developing locally by simply having an express server execute the EffectFn2 mentioned above. I'd love to replace the latter with purescript-warp. And once I have a wai binding for AWS lambda then we'll transition our backend to that as well. :)

Woody88 commented 4 years ago

Hi @jvliwanag !

Thanks for considering using wai and warp. Actually, HttpRequest is not what you're looking for, maybe https://github.com/Woody88/purescript-wai/blob/master/src/Network/Wai/Types.purs#L30. ? You can implement an instance for your ApiGwReq. We will need a way to somehow have the ability to coerce types that have instances of WaiRequest. This would allow one to go from ApiGwReq -> HttpRequest. I would like to hear more about it when you have time you can just dm me on slack?

jvliwanag commented 4 years ago

My problem is Application and by extension Middleware both use HttpRequest

https://github.com/Woody88/purescript-wai/blob/master/src/Network/Wai/Http.purs#L30

Woody88 commented 4 years ago

But HttpRequest represents a node's http handler. You just want to represent an Application for AWS Lambda. So, we need something like wai-aws-lambda which just creates an instance of waiRequest and creates its own Application type in its own module. The reason, why I have it this way is because I don't want to extra overhead when using node's http or http2. The old way that I had it would have the user copy the data that is already loaded in any of the node's HTTP handlers into a wai Request record type.

jvliwanag commented 4 years ago

Hm, I understand that the goal WaiRequest is to (1) abstract against request types. and (2) reduce the overhead and access the node object instances directly. But my guess neither is currently achieved.

On (1), since Application uses a concrete instance that points to Http.Request, one can't write an Application as defined here that doesn't use Http.Request. A goal of haskell's wai is to provide an abstraction so that one can switch server implementations while retaining the same Application. It needs to be the same type.

On (2) -- and this is where the guessing part comes in -- won't the dictionary passing caused by purescript typeclass information provide more of a performance penalty? Vs just creating a single js object once to copy over the values you need. (again, this is just guesswork).

My proposition is to bring back the old record implementation.

Woody88 commented 4 years ago

Thanks, participating in this!

So, this was the initial goal but the problem is that we still need somehow have access to node's Request type. However, I want to keep the Applicationtype monomorphic without having to introduce type parameters. So, one of my attempts was to introduce a field in the old Request type called nodeRequest :: Maybe Foreign, and this would allow people to unsafeCoerce and get actual request type (not sure if it's right...).

So is there a good solution for the two points below:

As for point (2), I do not think that it should cause more performance penalty but I haven't tried it on a bigger app (so guessing as well). However, I did do some small profiling and the old record implementation cost a lot of memory compared to the current version.

jvliwanag commented 4 years ago

My suggestion is for nodeRequest :: Foreign. and then have a function readHTTPRequest :: Foreign -> Maybe HTTP.Request. This makes it similar to other api's on Foreign and may be implemented by a simple instanceof.

I'd like to also use nodeRequest type myself on lambda but populate it with a different type. In particular, there's some metadata AWS provides on the request that I can put there.

If we plan for nodeRequest to be used that way, I wonder if nodeRequest is still the best name. :)

Not certain though what can be done on the performance front.

Though I prefer regular records to allow for the flexibility of non node.http implementations -- if you're keen on keeping the typeclass, perhaps we can change Application?


type Application = forall req. WaiRequest req => req -> (Response -> Aff Unit) -> Aff Unit
Woody88 commented 4 years ago

Though I prefer regular records to allow for the flexibility of non node.http implementations -- if you're keen on keeping the typeclass, perhaps we can change Application? type Application = forall req. WaiRequest req => req -> (Response -> Aff Unit) -> Aff Unit

I had tried that approach before but the compiler was complaining about the "req" escaping the scope. However, I managed to fix it this time (I was probably too tired at that time).

So, I also prefer the simple record approach but I am also thinking about performance. So, I did some http benchmarking for both the record and typeclass approach. The samples are using the Hello World example that is in the purescript-warp's example folder. We can see that with the record type approach there is an extra overhead because it spends most of its time copying the data into the Request type. As for the typeclass approach, it spends most of its time in the sendResponse which means there isn't any overhead.

Screen Shot 2020-07-18 at 14 19 52 Screen Shot 2020-07-18 at 15 16 41 Screen Shot 2020-07-18 at 14 48 05 Screen Shot 2020-07-18 at 14 53 49

For now, it seems like the typeclass approach is better.

jvliwanag commented 4 years ago

So for a hello world example, this is true. But I wonder if this still holds true when you start calling the Wai.Request functions (i.e. print the path, get the query headers etc) -- then you'd need to bring in the dictionaries/typeclass instance.

There's a chance the typeclass approach might still perform better -- not sure, since creating the request via record has an upfront cost to solve everything while creating the record. Perhaps haskells' wai does not run into this problem by being lazy.

Looking into haskell wai, I've been thinking about why they have derived fields directly written on the record -- such as requestHeaderHost, and requestHeaderUserAgent. My guess is that those are used repeatedly across the lifetime of each request. So instead of reevaluating them (looking for the values on the headers), solve them once and then return that value.

If we follow the regular record route, then I'm thinking we'd benefit from that value reuse. This is opposed to the typeclass approach where you end up reevaluating if you, say, ask for userAgent more than once.

Woody88 commented 4 years ago

This is opposed to the typeclass approach where you end up reevaluating if you, say, ask for userAgent more than once.

Hmm... I don't think that it's quite right. So, we know know that for a typeclass you ever only have one instance for a type (or we will get overlap), so we will always only have one record of WaiRequest. Next, javascript is a strict language and if we are using node http lib all values are already evaluated within the Http.IncomingMessage object.

Now, with WaiRequest class definition I am pretty much wrapping the Http.IncomingMessage values within a thunk. So, unlike the record approach, we do no copy over values from the Http.IncomingMessage object to a new object on every new connection.

https://github.com/Woody88/purescript-wai/blob/a5fead0a5e7132cf3ac454def018b7d619b8c022/src/Network/Wai/Types.purs#L30-L41

Below you can find the warp implementation. There isn't much overhead at all since I am using unsafeCoerce and purescript-node-http uses unsafeCoerce as well. Now, whether function calls are expensive — that I am not sure... https://github.com/Woody88/purescript-warp/blob/43281209e2ea79dcf3b5c84d09545f1ab0e68cb0/src/Network/Warp/Http.purs#L28-L53

Lastly, I did some benchmark and profiling using the HelloWorld example, however, this time I returned values from the WaiRequest, and I was still getting better benchmarks than the record approach.

Anyway, I think that it is time to probably start releasing a version for http-types, wai, and warp. If we do realize that record type has better values, in the long run, I do not think that it will be a huge breaking change.

What do you think?

jvliwanag commented 4 years ago

I don't wish to delay any release of the libraries. Those are really useful as they are. :)

As per reevaluation, consider the current definition of userAgent:

userAgent = fromMaybe mempty <<< Map.lookup (wrap "user-agent") <<< Map.fromFoldable <<< httpHeaders

If we try to debug it and change it to:

userAgent = (trace "foo" <<< const) <<< fromMaybe mempty <<< Map.lookup (wrap "user-agent") <<< Map.fromFoldable <<< httpHeaders

And then create an application:

app :: Application
app req f =
  f $ responseStr ok200 [(hContentType /\ "text/plain")] $ (userAgent req) <> (userAgent req)

You'd notice that the foo trace prints out twice. (Also, as an aside, userAgent probably should be Maybe String)

It can be argued that no one would need userAgent twice anyway -- although again, my bet is that it's the other way around. That userAgent commonly is looked for multiple times on a server that haskell's wai decided to put it on their Request data type. And that evaluating it to a concrete value has lesser overhead. (albeit lazily)

In any case, I think it's still important to solve the part that wai not be bound to the single node's http implementation. Any preference how to go about it?

Woody88 commented 4 years ago

Ah! That is what you meant. Yea, I don't think that we should be worrying about this right now. Maybe it's best to wait until we have more hard evidence.

In any case, I think it's still important to solve the part that wai not be bound to the single node's http implementation. Any preference how to go about it?

If you look at the previous link that I pasted you will see that I moved node's http implementation in warp just like it was before. I just haven't merged in the master branch yet.

@jvliwanag Thanks for thinking through this with me, much appreciated!

jvliwanag commented 4 years ago

Thanks also for starting this out. This is my first exposure even to haskell's wai.

So the thing that I still would like to see a change on are on these definitions:

type Application = HttpRequest -> (Response -> Aff Unit) -> Aff Unit 

type Middleware = Application -> Application

newtype HttpRequest = HttpRequest HTTP.Request

I can't build a wai binding for lambda since I don't have an HTTP.Request. Since you've moved it onto warp though, I'm eager to check out what's left of wai.

As an aside, I also wish to build a wai binding for express. Which does use node http, but has an additional layer of request/responses to it. Now I'm not after that additional layer -- but more of once I have an express binding, I can put up an express server which then runs an application that ties to a wai binding. That express server also utilizes other middleware like webpack. So it's related to, but not exactly -- https://github.com/Woody88/purescript-warp/issues/5 .

Woody88 commented 4 years ago

So the thing that I still would like to see a change on are on these definitions:

Sorry, that it wasn't clear but I have the typeclass approach in another branch, I just did not merge it to the master branch yet. So we will be left with pretty much with this:
https://github.com/Woody88/purescript-wai/blob/a5fead0a5e7132cf3ac454def018b7d619b8c022/src/Network/Wai.purs#L23-L24

Now the Application request isn't tied to any specific library. I think that this should work for you? I would prefer the record type approach but I will wait until I have more evidence like I said in my previous post.

Woody88 commented 4 years ago

Hi @jvliwanag,

I Just want to give you an update about something that I stumbled upon. I am writing a wai-logger which is inspired by node's morgan lib, and I noticed that the typeclass approach is giving me problems when combining middleware. For example, the code below forces me to use parentheses when stacking middleware. If I were to use the apply function ($), the compiler will throw The type variable req has escaped its scope, appearing in the type

main :: Effect Unit
main = do 
    let beforeMainLoop = Console.log $ "Listening on port " <> show defaultSettings.port
    void $ runSettings defaultSettings { beforeMainLoop = beforeMainLoop }
      (loggerMiddleware dev stdout app )

app :: Application 
app req f = do
    f $ responseStr ok200 [(hContentType /\ "text/plain")] "Hello World!"

I would prefer for people to just be able do this:

main :: Effect Unit
main = do 
    let beforeMainLoop = Console.log $ "Listening on port " <> show defaultSettings.port
    void $ runSettings defaultSettings { beforeMainLoop = beforeMainLoop }
      $ loggerMiddleware dev stdout
      $ gzipMiddleware
      $ mainApp 

mainApp :: Application 
mainApp req f = do
    f $ responseStr ok200 [(hContentType /\ "text/plain")] "Hello World!"

If we were to go back to the original version which is the record type approach we would not face this issue.