Nike-Inc / hal

hal provides an AWS Lambda Custom Runtime environment for your Haskell applications.
BSD 3-Clause "New" or "Revised" License
241 stars 13 forks source link

Respecting `_HANDLER` environment variable #69

Closed endgame closed 3 years ago

endgame commented 4 years ago

Continuation of #66 . The AWS docs for custom runtimes say that the _HANDLER environment variable lets the same piece of code be deployed multiple times to handle different requests.

In larger applications, I think that this can be useful to limit the proliferation of executable targets in cabal files for larger lambda applications. One way to do this could be to provide a Handler GADT that can wrap up functions matching the types of those in AWS.Lambda.Runtime, and a function runHandlers :: [(Text, Handler)] -> IO () or something.

@IamfromSpace :

Utilize `_HANDLER` Interesting, it's crossed my mind that a user could leverage this, but I could never come up with a real use case. If we had many lambdas, and wanted to build a single executable that could support each, why not simply use a single lambda? In this case, you'll get more sharing and therefore reduced cold starts. The only real advantage I could think of was IAM separation. I suppose if two lambdas can receive the exact same event type that truly couldn't be distinguished, then you'd need some other way to tell them apart and `_HANDLER` could do that. I'm not sure I've really seen a need for that though--it seems like you'd prefer to live in a world where your event is processed solely according to it's content.

If you can distinguish the events without needing to branch on _HANDLER then you're right that you'll share your execution environments and cut down on cold starts. Example: once #52 lands, it should be possible to build a little router package on top of it for API Gateway use.

But suppose you have a larger application made out of a bunch of Lambda functions responding to different events: some listening to API Gateway events, some listening to SNS topics, whatever. With the current setup, it's easy to fall into a setup where you feel like you "need" to have a separate executable for each lambda. This then slows you down because you're going to be building and uploading a bunch of different executables.

I think there's a way you can do this within the existing combinators, including branching off _HANDLER. It might need dependent-map to be completely safe. Supporting _HANDLER might be a good fit here, but going to the Nth degree of type-safety might be above the complexity appetite for this package. I'll probably need to experiment and think some more before I have something I really want to push ahead with.

IamfromSpace commented 4 years ago

One option that already exists (but isn't made obvious anywhere) is to use Alternative to create a sum type of all of your events, then map to the appropriate handler.

Here's a snippet from a project where I did this (I've left the comment, because it's somewhat relevant to the convo!)

data HandlerEvent
  = HttpEvent (ApiGatewayProxyRequest UserId)
  | CronEvent CloudWatchEvent

instance FromJSON HandlerEvent where
  parseJSON v =
    (HttpEvent <$> parseJSON v) <|> (CronEvent <$> parseJSON v)

data HandlerResponse
  = HttpResponse ApiGatewayProxyResponse
  | CronResponse ()

instance ToJSON HandlerResponse where
  toJSON (HttpResponse x) = toJSON x
  toJSON (CronResponse x) = toJSON x

--TODO: This whole strategy might be more trouble than it's worth
--and they should just be two entirely distinct lambdas.
--They should _at least_ be separated into separate modules.
handler :: MonadAWS m => Text -> HandlerEvent -> m HandlerResponse
handler tableName event =
  case event of
    HttpEvent e -> HttpResponse <$> httpHandler tableName e
    CronEvent e -> CronResponse <$> cronHandler tableName e

It's a bit boilerplate-y and there's no examples or recommendations towards a pattern like this, but I do think it's a pretty reasonable approach. Not sure how to improve upon this or make it "first class" rather than some random trick.

And while the _HANDLER is very natural to configure (it's always sort of odd adding in "NOT_USED" and such), one can always inject a custom environment variable that switches implementations in this way. Hell, there's no way to prevent access to _HANDLER so there's nothing stopping a user from accessing it.

It's an interesting idea though. I'm still not seeing the use case jump out at me, but I'd certainly be interested to pull the thread a bit more.

endgame commented 3 years ago

I think it's probably going to remain a niche tool in the Haskell Lambda toolbox, but I think it could be handy in niche cases. Maybe for smaller projects where you just starting to split things out, or when you're refactoring code and don't want to commit to standing up new executable targets.

How about this? I feel like it's simple enough match hal's design philosophy, and means that hal can cover the _HANDLER Lambda feature without warping everything else around it:

import Control.Monad.IO.Class
import Data.Map (Map)
import qualified Data.Map as M
import Data.Text (Text)
import System.Environment (getEnv)
import qualified Data.Text as T

dispatchHandler :: MonadIO m => Map Text (m ()) -> m ()
dispatchHandler handlers = do
  handler <- fmap T.pack . liftIO $ getEnv "_HANDLER"
  case M.lookup handler handlers of
    Nothing -> error $ "No handler defined for " ++ show handler
    Just h -> h

If you really wanted to force the usual FileName.functionName convention, you could write some TH like:

handlers :: [Name] -> ExpQ
handlers = _

main = $(handlers ['Foo.bar, 'baz, 'quux])

But this feels too magical to fit in with the rest of hal. I like the dispatchHandler function better.

IamfromSpace commented 3 years ago

This does look like something that is pretty small and probably reasonably supportable in the long term. I like that the "_HANDLER" is hidden from the user. And making it m () means that we don't have to worry about which runtime is used for which env var value--mixing and matching isn't an issue. I also agree that the less "magic" approach is better here. I think a magic package that uses hal might make sense to just quick start folks, but definitely, the first is closer to hal's design philosophy.

My outstanding thoughts are:

  1. dispatchHandler as a name doesn't seem to capture what the function does. Just in general, I think a challenge is getting people to connect the dots between the Lambda configuration and this--my hope is that a clearer name might do it.
  2. It would still be nice to get some type safety without too much excess machinery (like dependent-map, as you mention). I wonder if there's some approach based on typeclasses/derive Generic/etc that could eliminate boilerplate safely.
  3. I think we'd want some really clear error messaging here (I imagine this has already crossed your mind--your example makes perfect sense for this discussion).
  4. There's a lingering thought on my mind about this being a static program selector vs the example I had above being a dynamic one. Perhaps there's some abstraction that gets both? A thought about arrowized hal had crossed my mind, but that would really just be experimental.

Not to say we need to work out all things above, just stuff I figure is worth discussing a bit more :)

endgame commented 3 years ago

And making it m () means that we don't have to worry about which runtime is used for which env var value--mixing and matching isn't an issue.

This isn't quite what I meant, and I think it's clearer if I show more of an example sketch:

main = dispatchHandler $ Map.fromList 
  [ ("Foo", runReaderTLambdaContext (evalStateT (mRuntimeWithContext myHandler) 0)) -- for argument's sake
  , ("Bar",  liftIO $ pureRuntime someOtherHandler)
  ]

There's not really anywhere to stick a forall into the type signature for dispatchHandler because we need to return something that unifies with IO () (so it can be main).

IamfromSpace commented 3 years ago

Yeah, sorry, that example is what I expected, I just said it poorly :) Before the previous comment, I was thinking about having the runners on the result of dispatchHandler, which would be problematic.

Given that we'd expect the user to resolve constarints before passing to the dispatchHandler, is there a benefit to use the MonadIO constraint vs expecting IO itself?

I suppose a constraint does allow you to resolve constraints that are common to all handlers at the end.

endgame commented 3 years ago

is there a benefit to use the MonadIO constraint vs expecting IO itself?

Yes. If you use mRuntimeWithContext, you aren't forced down into IO if you want to dispatch on _HANDLER. If we provide dispatchHandler :: Map Text (IO ()) -> IO (), there's an IO () in negative position (in the argument) and you can't recover the more general form without something like unliftio.

endgame commented 3 years ago

I think the best thing here is for programmers to inspect _HANDLER themselves if they want to serve multiple handlers from one deployment package. As I've said elsewhere, the strength of hal is its simplicity.