elixir-lang / elixir

Elixir is a dynamic, functional language for building scalable and maintainable applications
https://elixir-lang.org/
Apache License 2.0
24.53k stars 3.38k forks source link

Could we allow the child_spec shortcut to be more general #7271

Closed pragdave closed 6 years ago

pragdave commented 6 years ago

Currently,

children = [
  Worker
]

is the same as

children = [
  { Worker, [] }
]

and this passes an empty list to start_link.

Because the second element of the tuple is assumed to be the argument, there's no way to pass multiple arguments or no arguments.

In many other cases like this, the argument parameter takes a list, and each element in the list becomes a parameter to the function call. Thus

{ Worker, [] }           # would pass no parameters
{ Worker, [ 99 ]},       # would pass the single parameter 99
{ Worker, [ 99, 100 ]}   # would pass two
{ Worker, [[]] }         # would pass an empty list

I don't see why the default is an empty list, rather than just not passing an argument. And I'd expect this shortcut to follow the semantics of GenServer.start_link, which takes a list of parameters.

So can I propose that

Worker,

is the same as {Worker, []} as it is now, but that the interpretation of this is that start_link is passed no arguments. Similarly,

{ Worker, [ 1, 2 ]}

would call Worker.start_link/2, and so on.

This would make it consistent with GenServer.start_link: easier to explain and remember.

Dave

josevalim commented 6 years ago

This was an explicit design decision. We want to have a single callback, start_link/1, as argument to avoid the argument dance.

After changing some projects and giving a couple trainings, I am absolute sure this is the best way to go about it. Plus changing the meaning of the second element of the tuple would be backwards incompatible. :)

If you want to pass multiple arguments, you can still build a complete child_spec. But the convention we want to push is a simple start_link/1.

pragdave commented 6 years ago

So you'll change GenServer.start_link to be compatible (as the same arguments apply)?

josevalim commented 6 years ago

@pragdave I am not worried about this case because we always wrap GenServer.start_link on your own module as it needs a callback module. So I just go ahead and define start_link/1 in the module as well.

adkron commented 6 years ago

I have a tendency to agree with @pragdave. This looks similar to MFA, but behaves differently. Also, if it did behave like MFA and default to no args, then it would be backward compatible and simpler to adopt.

In working with multiple people who are newer to Elixir the behavior that is current is surprising and confusing. Why do I have a default empty list?

josevalim commented 6 years ago

Eh, saying it is similar to a MFA is a stretch, at a glance there is at least 33% of it missing. :)

Why do I have a default empty list?

Luckily that's a very easy question to answer: most start_link/1 accepts options, so that’s a reasonable default. If you don’t want that, then pass something explicitly.

The previous approach would trip even experienced developers. For example, they would start an Ecto repository and they had to pass options to it, so they would write this:

supervisor(MyApp.Repo, [some: :config])

But that would unfortunately be called as:

MyApp.Repo.start_link({:some, :config})

Which was most definitely not what people wanted!

When you would implement your own servers, the previous variadic args approach would also be very confusing because multiple arguments would be passed to start_link but you cannot pass multiple arguments to init. So mapping how the arguments from worker/1 and supervisor/1 would end-up in init/1 was always confusing. Now it is very clear child_spec/1 -> start_link/1 -> init/1.

Even worse, adding new arguments to the previous start_link approach almost always ended-up with undefined function errors. Pushing towards a keyword list pushes developers to raise better errors semantically:

def start_link(opts) do
  name = opts[:name] || raise ":name is required when starting Foo"
  ...
end

This makes a huge difference, especially because failing to boot a supervised process means your application won't start and this limits how you can further debug the problem.

I am confident the start_link/1 approach is the way to go. Being able to say "start_link/1 is a convention that we use throughout development, testing, etc" is very powerful because it is one less API decision to make. If the question is why an empty list is a default, I would take that any day over the issues we were used to see with the previous specs. Especially because answering this question directly points people towards the current practices.

sasa1977 commented 6 years ago

This would make it consistent with GenServer.start_link: easier to explain and remember.

I assume you refer to the second argument to GenServer.start_link, called args. This argument is passed as the single argument to init/1, and therefore I think that the current child spec approach is consistent with GenServer. There is always exactly one argument taken in init/1, and with child specs there is always exactly one argument taken in start_link/1.

I should add that I was initially a bit skeptical about enforcing start_link/1, but now I think it makes a lot of sense. In most cases, start_link is even not meant to be invoked directly. In fact, at my work, we've ended up marking all of our start_link with @doc false.

Finally, I think that this interface is consistent with child_spec/1, since in many cases, the functions child_spec, start_link, and init will take the same argument. I wonder if use GenServer could capitalize on this and generate the default start_link/1 implementation which would just forward the arg to init/1.

josevalim commented 6 years ago

I wonder if use GenServer could capitalize on this and generate the default start_link/1 implementation which would just forward the arg to init/1.

I considered this but I like to validate the inputs on start_link/1 to avoid nesting errors inside process exits even further. Something like:

def start_link(opts) do
  name = opts[:name] || raise ":name is required when starting Foo"
  inner_arg1 = ...
  inner_arg2 = ...
  GenServer.start_link(__MODULE__, {inner_arg1, inner_arg2}, name: name)
end

I typically explain start_link as the client side of init/1.

sasa1977 commented 6 years ago

Yeah, I'm not so sure myself about the default start_link/1 because in most cases my GenServers are somehow registered based on the argument, but I thought it might be good to consider the idea, if nothing else, then for the adopting/learning purposes.

fishcakez commented 6 years ago

I like to validate the inputs on init/1 because it means I only need to destructure the argument once and don't need to split configuration/initialization/recovery handling when some of it does side effects.

pragdave commented 6 years ago

Remember that the start_link and init live in very different worlds.

start_link is the API side of a genserver. It is public facing, and should behave like any other API: it should make things convenient for the caller.

init is the internal implementation of the server—it is hidden from the client, and the state it manages is opaque.

So I agree that the two shouldn't be automatically created. But I disagree strongly that the API of init should influence the API of start_link: they are apples and oranges.

On Sat, Jan 27, 2018 at 5:35 PM, José Valim notifications@github.com wrote:

I wonder if use GenServer could capitalize on this and generate the default start_link/1 implementation which would just forward the arg to init/1.

I considered this but I like to validate the inputs on start_link/1 to avoid nesting errors inside process exits even further. Something like:

def start_link(opts) do name = opts[:name] || raise ":name is required when starting Foo" inner_arg1 = ... inner_arg2 = ... GenServer.start_link(MODULE, {inner_arg1, inner_arg2}, name: name) end

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elixir-lang/elixir/issues/7271#issuecomment-361024533, or mute the thread https://github.com/notifications/unsubscribe-auth/AAApmCPnnQEJIEoMGhvsbnjJAdmExj1Zks5tO7KrgaJpZM4RvdhM .

fishcakez commented 6 years ago

because in most cases my GenServers are somehow registered based on the argument

A trouble here is that child_spec can be overridden to change the start_link arity. This can't happen with any of the other default functions as the arity is set. This could mean having a misleading default implementation. I think that means we shouldn't auto generate a start_link/1.

josevalim commented 6 years ago

@pragdave nowadays we have almost no reason to call start_link/1 directly. It is mostly a callback for the supervisor. Even ExUnit allows you to start supervised processes easily.

And that's actually a good thing. Pushing people away from start_link will help stop them from creating processes in ad-hoc fashion and guide them towards putting them under supervision trees.

sasa1977 commented 6 years ago

start_link is the API side of a genserver

For some time now, I'm wondering if this is the case. In most situations, start_link could be considered as a callback used by Supervisor. With the addition of start_supervised, you don't even need to invoke start_link from tests.

michalmuskala commented 6 years ago

The same validation that is done in start_link/1 could be done in child_spec/1. start_link/1 could be easily eliminated in that case and the child spec could directly refer to GenServer.start_link/3.

That's what I started doing recently in couple projects and it works quite well.

pragdave commented 6 years ago

Clones :)

How about this case (from h Agent)

For example, in the Mix tool that ships with Elixir, we need to keep a set of all tasks executed by a given project. Since this set is shared, we can implement it with an agent:

defmodule Mix.TasksServer do use Agent

def startlink() do Agent.start_link(fn -> MapSet.new end, name: MODULE) end

Let me pivot a little and broaden out the discussion.

In the early days, I pushed José to abandon some of the more egregious Erlang patterns and names, and he (rightly) resisted.

But now what's happening is that we're incrementally moving away from the Erlang world: show this stuff to a pure Erlang developer and they'd have no idea what was going on. But, at the same time, because the changes are incremental, we're still carrying the baggage around with us.

I suspect some of this discussion is caused by this impedance mismatch.

I wonder if there's merit in stopping the incrementalism and instead getting to together to thrash out a "what would we go if there was no history" structure. Obviously, it's OTP under the covers, but we come up with new, more modern, and frankly more convenient ways of using it. If nothing else, it might be an interesting exercise.

Dave

josevalim commented 6 years ago

@pragdave excellent example. Just recently I have updated the getting started guide, the books are being updated now, so it will take a while for the patterns to settle. We have rewritten the Supervisor docs but we did not check the other behaviours, I will work on it.

I wonder if there's merit in stopping the incrementalism and instead getting to together to thrash out a "what would we go if there was no history" structure.

The new child specs are exactly this. It is a concept that will only work on Elixir and it does not even work with previous code unless they are changed to follow the new conventions. It is not an intermediate step. From my perspective, trying to stick with MFArgs or lingering to start_link is an attempt to keep with the old behaviour. :)

But maybe there an even further step to take. Maybe it is what @michalmuskala is proposing. But I can't say right now that's it. Maybe when we look back in 5 years it will obvious but now it certainly isn't.

pragdave commented 6 years ago

But why are they called "child specs"? They are no longer. The servers are not children of the supervisors in any meaningful way. They are simply managed by them. The supervision structure and the process structure are separate.

But Erlang calls them child specs, and so now we do too.

So I do think we're dragging Erlang along with us, and I think that is making Elixir confusing: we're changing the abstractions but retaining the vocabulary.

josevalim commented 6 years ago

The goal of a supervisor is to start, restart and shut supervised processes down. I wouldn't describe that as "management".

The starting of processes is one way to see the parent-child relationship and it manifests even in simple code such as:

parent = self()
spawn(fn ->
  send(parent, :hello)
end)

The process that starts another process is often referred as its parent, even outside of the supervision context.

Another way to see this relationship is the hierarchy we get inside supervision trees and manifests itself in other places, such as the supervisor.

So to me there is a very clear parent-child relationship here, both in terms of spawning/starting and in terms of hierarchy.

pragdave commented 6 years ago

I think one of the major reasons people don't use supervisors effectively is that they are used to drawing just one diagram, showing the relationship between processes (objects). They think of this as being the structure of the application, and try to force supervisors into it.

It took me a while to realize this is wrong. The supervision hierarchy is orthogonal to the architecture of processes.

I think that using 'child' here reinforces this incorrect thinking.

On Jan 27, 2018 19:41, "José Valim" notifications@github.com wrote:

The goal of a supervisor is to start, restart and shut supervised processes down. I wouldn't describe that as "management".

The starting of processes is one way to see the parent-child relationship and it manifests even in simple code such as:

parent = self() spawn(fn -> send(parent, :hello) end)

The process that starts another process is often referred as its parent, even outside of the supervision context.

Another way to see this relationship is the hierarchy we get inside supervision trees and manifests itself in other places, such as the supervisor.

So to me there is a very clear parent-child relationship here, both in terms of spawning/starting and in terms of hierarchy.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elixir-lang/elixir/issues/7271#issuecomment-361031003, or mute the thread https://github.com/notifications/unsubscribe-auth/AAApmByshrMlZUTY7xgHm5XYCL_2VV2Vks5tO9A-gaJpZM4RvdhM .

OvermindDL1 commented 6 years ago

Eh, saying it is similar to a MFA is a stretch, at a glance there is at least 33% of it missing. :)

Just as an aside, the {module, [Args...]} form is used in Erlang for callback modules, it is usually to implement a callback module with multiple functions on it where the args is appended to each call like apply(module, some_function, [42 | Args]) and apply(module, another_function, [1, 2, 3 | Args]) and so forth.

Also, I'm actually fairly for the child spec only taking one argument, but I don't think it should default to a list as that gives the impression of it being an M(F)A, it should rather instead be something like nil (or :undefined). At least in the docs, examples, generators, etc...

michalmuskala commented 6 years ago

The empty list is intended not to represent an "empty collection", but rather as an empty keyword, since most of the time the way you'd use the child spec would be with something like:

{Registry, name: :foo, type: :unique}

With no options, it's an empty list. It's like defaulting the opts argument to an empty list with opts \\ []

OvermindDL1 commented 6 years ago

@michalmuskala That works fine, but in that case the default case should really be {Registry}, not {Registry, []} for clarity reasons (I.E. internally transform a single element tuple into a 2-tuple with the second element being []). Easier to read and understand that way as the [] standing alone REALLY implies to me an M(F)A.

pragdave commented 6 years ago

How about supporting

{Registry, name: :foo, type: :unique, args: [ 1, 2, 3 ]}

where args gets copied into the MFA associated with the start option.

josevalim commented 6 years ago

@pragdave that would conflate the options given to the process with the child_spec behaviour. What if some process start_link expect an :args key? For example, should we support :shutdown there as well? What about processes like Task.Supervisor where :shutdown can be given as option on start_link?

josevalim commented 6 years ago

Just as an aside, the {module, [Args...]} form is used in Erlang for callback modules

Can you provide examples?

The only case I can recall {module(), term()} in Erlang is in the application module callback and the term() there is a single argument and not a list of arguments. Curiously, the default there is also an empty list (proplist).

OvermindDL1 commented 6 years ago

Nothing springs immediately to mind (I'm neck deep in SQL at the moment) but I know I've used it quite excessively in the past.

pragdave commented 6 years ago

@pragdave that would conflate the options given to the process with the child_spec behaviour. What if some process start_link expect an :args key? For example, should we support :shutdown there as well? What about processes like Task.Supervisor where :shutdown can be given as option on start_link?

Actually, it does the reverse, packaging the options passed to the managed server explicitly inside an args list, identifying them clearly. If you want to pass an args or shutdown parameter to the managed process, it's both easy and clear:

{Registry, name: :foo, type: :unique, server_args: [ shutdown: "often", args: [ 1, 2, 3 ]}

Looking at the code again, I think server_args is a better parameter name for this.

OvermindDL1 commented 6 years ago

I see the merit of :server_args, however if you need to go that far then why not just use a standard child entry instead of the shortcut?