elixir-lang / elixir

Elixir is a dynamic, functional language for building scalable and maintainable applications
https://elixir-lang.org/
Apache License 2.0
24.61k stars 3.38k forks source link

Format codebase files with the upcoming code formatter #6643

Closed josevalim closed 7 years ago

josevalim commented 7 years ago

Hi everyone!

Elixir master now includes a code formatter that automatically formats your code according to a consistent style.

We want to convert Elixir's codebase to use the formatter on all files. We understand this means a lot of code churn now but, on the positive side, it means we can automatically enforce and/or format future contributions, making it easier for everyone to contribute to Elixir.

We plan to migrate Elixir's source code to use the formatter gradually and we need your help! If you were looking forward to contribute to Elixir, this is a great opportunity to get started.

Please submit only one pull request per day with only one file formatted per PR, otherwise we are unable to give everyone feedback.

Helping with pull requests

You can contribute by picking a random file in Elixir's codebase, running the formatter on it, analysing it for improvements and then submitting a pull request.

The first step is to clone the Elixir repository and compile it:

$ git clone git@github.com:elixir-lang/elixir.git
$ cd elixir
$ make compile

If you have already cloned it, make sure you have the latest and run make clean compile again. You can read more about contributing in our README.

After you have latest Elixir on your machine, you need to pick a random file to format. We have added a script to do exactly that. From your Elixir checkout root:

$ bin/elixir scripts/random_file.exs

The script will tell you how to format a random file that has not been formatted yet. Run the command suggested by the script, check for style improvements and then submit a pull request for the changes on that single file. We will explain what are the possible improvements in the next section, so please look for them carefully. For more information on pull requests, read here.

Checking for style improvements

The formatter is guaranteed to spit out valid Elixir code. However, depending on how the original code was written, it may unecessarily span over multiple lines when formatted. In such cases, you may need to move some code around. Let's see some examples directly from the Elixir codebase.

Example 1: multi-line data structures

The example below uses Elixir old-style of indentation:

:ets.insert(table, {doc_tuple, line, kind,
                    merge_signatures(current_sign, signature, 1),
                    if(is_nil(doc), do: current_doc, else: doc)})

when formatted it looks like this:

:ets.insert(table, {
  doc_tuple,
  line,
  kind,
  merge_signatures(current_sign, signature, 1),
  if(is_nil(doc), do: current_doc, else: doc)
})

However, the code above only spawn multiple lines because we have a tuple full of expressions. If we move those expressions out, we will get better code altogether:

new_signature = merge_signatures(current_sign, signature, 1)
new_doc = if is_nil(doc), do: current_doc, else: doc
:ets.insert(table, {doc_tuple, line, kind, new_signature, new_doc})

After you do your changes, run the formatter again, see if the final code looks as you desire and ship it!

Example 2: multi-line function definitions

Sometimes a function definition may spawn over multiple lines:

def handle_call({:clean_up_process_data, parent_pid, child_pid}, _from,
                %{parent_pids: parent_pids, child_pids: child_pids} = state) do

When formatted, it will now look like:

def handle_call(
      {:clean_up_process_data, parent_pid, child_pid},
      _from,
      %{parent_pids: parent_pids, child_pids: child_pids} = state
    ) do

In some cases, the multi-line definition is necessary but, in this case, we could simply move the extraction of the state fields out of the function definition, since we are matching on any of them for flow-control:

def handle_call({:clean_up_process_data, parent_pid, child_pid}, _from, state) do
  %{parent_pids: parent_pids, child_pids: child_pids} = state

The code is now clearer and the function definition matches only on arguments that must be matched on the function head.

After you do your changes, run the formatter again, see if the final code looks as you desire and ship it!

Example 3: calls with many arguments

The example below is formatted by the formatter:

assert_raise EEx.SyntaxError,
             "nofile:2: unexpected end of string, expected a closing '<% end %>'",
             fn ->
               EEx.compile_string("foo\n<% if true do %>")
             end

This doesn't look ideal because of all the whitespace it leaves to the left of the code. In this case, prefer to extract arguments in variables so that they all fit on one line. This is especially easy in cases like the one above, because fn can be cleanly split at the end, so it's enough to extract the message into a variable (which is a common pattern in the Elixir codebase specifically regarding assert_raise/3):

message = "nofile:2: unexpected end of string, expected a closing '<% end %>'"

assert_raise EEx.SyntaxError, message, fn ->
  EEx.compile_string("foo\n<% if true do %>")
end

After you do your changes, run the formatter again, see if the final code looks as you desire and ship it!

Example 4: line splits in ugly places

The example below is formatted by the formatter:

"Finished in #{format_us(total_us)} seconds (#{format_us(load_us)}s on load, #{
  format_us(run_us)
}s on tests)"

As you can see, the line split happens on #{. This is not ideal, so in cases similar to this, the correct thing to do is to split the string into multiple strings and concatenate them through <>:

"Finished in #{format_us(total_us)} seconds (#{format_us(load_us)}s on load, " <>
  "#{format_us(run_us)}s on tests)"

After you do your changes, run the formatter again, see if the final code looks as you desire and ship it!

Summing up

Let us know if you have any questions and we are looking forward to your contributions!

whatyouhide commented 7 years ago

I performed the steps José suggested and came up with the first PR, https://github.com/elixir-lang/elixir/pull/6652. If you're in doubt you can use it as an example :).

LostKobrakai commented 7 years ago

How about nested function calls to pipes? I've got elixir/lib/port.ex and there are a few line like that after formatting: nillify(:erlang.port_info(port, item)). It could be written as port |> :erlang.port_info(item) |> nillify()

josevalim commented 7 years ago

@LostKobrakai leave those as is especially because many modules in Elixir core cannot use pipes because of bootstrapping issues.

gmile commented 7 years ago

@josevalim running locally, I'm seeing the following code change introduced by formatter:

-    assert_raise RuntimeError, "the given function must return a two-element tuple or :pop, got: 1", fn ->
-      Keyword.get_and_update!([a: 1], :a, fn value -> value end)
-    end
+    assert_raise RuntimeError,
+                 "the given function must return a two-element tuple or :pop, got: 1",
+                 fn ->
+                   Keyword.get_and_update!([a: 1], :a, fn value -> value end)
+                 end

What would be a preferred way to fix this one? If I am to asked, I'd keep the original version as it looks fine?

josevalim commented 7 years ago

@gmile move the message out!

msg = "... long string ..."
assert_raise RuntimeError, msg, fn ->
  ...
end
gmile commented 7 years ago

@josevalim thanks! done in https://github.com/elixir-lang/elixir/pull/6690

josevalim commented 7 years ago

Folks running the formatter multiple times, please don't forget to frequently pull from master. :) Thank you!

LostKobrakai commented 7 years ago

And do not forget to recompile after pulling :D

josevalim commented 7 years ago

Ok folks, this was a bigger success than I expected. :D If you have already sent a PRs today, please hold back until we clear up the backlog. I am afraid we may find bugs in the formatter that will generate rework on later steps. Thanks everyone so far! I will let you know as soon as we are ready to handle more! :heart:

LostKobrakai commented 7 years ago

Also travis seems to run rather hot today :P

josevalim commented 7 years ago

Thanks for everyone who contributed in the last 24 hours! After your contributions, 37% of the codebase is now properly formatted (it was around 8% when we started).

$ elixir scripts/random_file.exs --stat
169 out of 452 files formatted

We welcome more contributions but note we are keeping the restriction of one pull request per day so we can give feedback to everyone. Also please check if your previous PR have been merged or if there is any work pending. :heart:

josevalim commented 7 years ago

Folks, please remember to fetch latest master and run make clean compile before sending contributions so you are sure to run on the latest version of the formatter.

jeroenvisser101 commented 7 years ago

@josevalim you might want to tag this with hacktoberfest to get more help, seems like a very easy chore that helps people get introduced to the Elixir codebase

josevalim commented 7 years ago

Hi folks, we have crossed another another exciting milestone: 70% of the files in the codebase are now formatted (314 out of 452). 🎉

In the last 3 days we had 150 pull requests by 67 people merged, thanks everyone for contributing. :heart:

@jeroenvisser101 we are keeping the hacktoberfest in the back of our minds but for now we almost have more work than we can handle so I don't want to increase the flux of PRs (yet).

glazzari commented 7 years ago

How about raise/2? Should we extract the message out to a variable?

-        raise ArgumentError, "setup/setup_all expect a callback name as an atom or " <>
-                             "a list of callback names, got: #{inspect k}"
+        raise ArgumentError,
+              "setup/setup_all expect a callback name as an atom or " <>
+                "a list of callback names, got: #{inspect(k)}"
josevalim commented 7 years ago

@guilhermelazzari only if extracting to a variable makes both the variable definition and raise fit on the same line. The general principle is that we want to avoid multi-lines, unless there is no way around it (which is the case above).

iJackUA commented 7 years ago

@josevalim do you have kind of official code style guide in written form like https://github.com/christopheradams/elixir_style_guide or https://github.com/rrrene/elixir-style-guide where current Formatter rules are specified? (But I suppose no, as rules are still being updated).

whatyouhide commented 7 years ago

@iJackUA the rules are still being formed but the closest style guide is https://github.com/lexmag/elixir-style-guide which is maintained by core team members :).

josevalim commented 7 years ago

Alright folks, this is it! There are only 15 files left so we are closing this! If you have already started working on something, then please still submit your PR, but don't start working on anything new since with 15 files left the chance of getting a conflict is quite high.

And those that have PRs open, please finalize it. :)

Tomorrow I will send some statistics about the work done. :) Thanks everyone! :heart: :green_heart: :blue_heart: :yellow_heart: :purple_heart:

zeroum commented 7 years ago

Ohhh, 2 minutes late!

josevalim commented 7 years ago

Thanks everyone who contributed! All files in the Elixir codebase have been formatted and now all future contributions expect the files to be formatted, otherwise CI will fail.

Overall we formatted 452 files. We merged 214 Pull requests merged by 84 people with a total of 368 commits pushed to master.

This was fun but hopefully we don't have to do something like this again soon! 😁

:heart: :green_heart: :blue_heart: :yellow_heart: :purple_heart: