erl_parse doesn't parse the arguments for type, spec, opaque and other attributes with line and column information

elbrujohalcon commented 3 years ago

Describe the bug

According to the docs, erl_syntax:attribute_arguments/1 Returns the list of argument subtrees of an attribute node. In all fairness, it does just that, but… if the attribute is module, export, import, file or record, the resulting list contains all the subtrees with proper positions and everything. But in the case of other attributes (like type, spec or export_types), it contains a representation of the abstract syntax tree that represents them, losing a lot of valuable information, like their line numbers.

To Reproduce

Run the following code…

Node = {attribute,1,type,
           {t,{type,1,union,
                    [{atom,1,a},
                     {atom,2,b},
                     {remote_type,3,[{atom,3,c},{atom,3,d},[]]}]},
              []}},
erl_syntax:attribute_arguments(Node).

Note that Node is the result of parsing the following code with erl_parse:parse_form/1…

-type t() :: a
           | b
           | c:d().

The result of the call to erl_syntax:attribute_arguments/1 above will look like…

[#tree{
     type = tuple,
     attr = #attr{pos = 1,ann = [],com = none},
     data =
         [#tree{
              type = atom,
              attr = #attr{pos = 0,ann = [],com = none},
              data = t},
          #tree{
              type = tuple,
              attr = #attr{pos = 0,ann = [],com = none},
              data =
                  [#tree{
                       type = atom,
                       attr = #attr{pos = 0,ann = [],com = none},
                       data = type},
                   #tree{
                       type = integer,
                       attr = #attr{pos = 0,ann = [],com = none},
                       data = 1},
                   #tree{
                       type = atom,
                       attr = #attr{pos = 0,ann = [],com = none},
                       data = union},
                   #tree{
                       type = list,
                       attr = #attr{pos = 0,ann = [],com = none},
                       data =
                           #list{
                               prefix =
                                   [#tree{
                                        type = tuple,
                                        attr = #attr{pos = 0,ann = [],com = none},
                                        data =
                                            [#tree{
                                                 type = atom,
                                                 attr = #attr{pos = 0,ann = [],com = none},
                                                 data = atom},
                                             #tree{
                                                 type = integer,
                                                 attr = #attr{pos = 0,ann = [],com = none},
                                                 data = 1},
                                             #tree{
                                                 type = atom,
                                                 attr = #attr{pos = 0,ann = [],com = none},
                                                 data = a}]},
                                    #tree{
                                        type = tuple,
                                        attr = #attr{pos = 0,ann = [],com = none},
                                        data =
                                            [#tree{
                                                 type = atom,
                                                 attr = #attr{pos = 0,ann = [],com = none},
                                                 data = atom},
                                             #tree{
                                                 type = integer,
                                                 attr = #attr{pos = 0,ann = [],com = none},
                                                 data = 2},
                                             #tree{
                                                 type = atom,
                                                 attr = #attr{pos = 0,ann = [],com = none},
                                                 data = b}]},
                                    #tree{
                                        type = tuple,
                                        attr = #attr{pos = 0,ann = [],com = none},
                                        data =
                                            [#tree{
                                                 type = atom,
                                                 attr = #attr{pos = 0,ann = [],com = none},
                                                 data = remote_type},
                                             #tree{
                                                 type = integer,
                                                 attr = #attr{pos = 0,ann = [],com = none},
                                                 data = 3},
                                             #tree{
                                                 type = list,
                                                 attr = #attr{pos = 0,ann = [],com = none},
                                                 data =
                                                     #list{
                                                         prefix =
                                                             [#tree{
                                                                  type = tuple,
                                                                  attr = #attr{pos = 0,ann = [],com = none},
                                                                  data =
                                                                      [#tree{
                                                                           type = atom,
                                                                           attr = #attr{pos = 0,ann = [],com = none},
                                                                           data = atom},
                                                                       #tree{
                                                                           type = integer,
                                                                           attr = #attr{pos = 0,ann = [],com = none},
                                                                           data = 3},
                                                                       #tree{
                                                                           type = atom,
                                                                           attr = #attr{pos = 0,ann = [],com = none},
                                                                           data = c}]},
                                                              #tree{
                                                                  type = tuple,
                                                                  attr = #attr{pos = 0,ann = [],com = none},
                                                                  data =
                                                                      [#tree{
                                                                           type = atom,
                                                                           attr = #attr{pos = 0,ann = [],com = none},
                                                                           data = atom},
                                                                       #tree{
                                                                           type = integer,
                                                                           attr = #attr{pos = 0,ann = [],com = none},
                                                                           data = 3},
                                                                       #tree{
                                                                           type = atom,
                                                                           attr = #attr{pos = 0,ann = [],com = none},
                                                                           data = d}]},
                                                              #tree{
                                                                  type = nil,
                                                                  attr = #attr{pos = 0,ann = [],com = none},
                                                                  data = []}],
                                                         suffix = none}}]}],
                               suffix = none}}]},
          #tree{
              type = nil,
              attr = #attr{pos = 0,ann = [],com = none},
              data = []}]}]

Expected behavior

Something along the lines of…

 [#tree{
      type = atom,
      attr = #attr{pos = 1,ann = [],com = none},
      data = t},
  #tree{type = type_union,
        attr = #attr{pos = 1,ann = [],com = none},
        data = [#tree{type = atom,
                      attr = #attr{pos = 1,ann = [],com = none},
                      data = a},
                #tree{type = atom,
                      attr = #attr{pos = 2,ann = [],com = none},
                      data = b},
                #tree{type = application,
                      attr = #attr{pos = 3,ann = [],com = none},
                      data = #application{operator = #tree{type = module_qualifier,
                                                           attr = #attr{pos = 3,ann = [],com = none},
                                                           data = #module_qualifier{module = #tree{type = atom,
                                                                                                   attr = #attr{pos = 3,ann = [],com = none},
                                                                                                   data = c},
                                                                                    body = #tree{type = atom,
                                                                                                 attr = #attr{pos = 3,ann = [],com = none},
                                                                                                 data = d}}},
                                          arguments = []}}]}]

i.e. a proper AST representing the type declaration, with proper positions and everything. Or at the very least the current result as shown above, but with proper positions (i.e. not 0).

Affected versions

I checked on OTP 22 and 23, but it's clear that this behavior is present since a much older version.

Additional context

Compatibility

I'm very aware that fixing this is not _just adding more clauses to this case statement_. There are other tools around (like erl_prettypr) that already assume that attribute_arguments or erl_syntax:subtrees/1 will abstractize all attribute arguments and they have workarounds in place for this. And there are external tools depending on this, too… like katana_code and rebar3_format (where I found this issue trying to solve NextRoll/rebar3_format#25). But there are also OTP tools that are working poorly because of this (e.g. erl_recomment which will place comments inserted in the middle of type definitions below them because they are not able to tell what's the code line where the type definition or function spec ends). It's likely that just fixing the position value in the current result suffices to make these tools work properly, but it's clear to me that returning a properly formed list of subtrees like the function is returning for other attributes (e.g. export) would be the right solution ™️ .

Bonus Oddity

With erl_syntax working as it is now… This is a totally valid Erlang module…

-module(a_module).

-type t() :: a
           | b
           | c:d().

-type(
    {t2,
     {type,
      19,
      union,
      [{atom, 20, a2}, {atom, 21, b2}, {remote_type, 3, [{atom, 22, c}, {atom, 22, d}, []]}]},
     []}
).

-export [x/1].

-spec x(t()) -> t2().
x(A) -> A.

…completely equivalent to…

-module(a_module).

-type t() :: a | b | c:d().
-type t2() :: a2 | b2 | c:d().

-export([x/1]).

-spec x(t()) -> t2().
x(A) -> A.

…but far more obfuscated. 😎

gomoripeti commented 3 years ago

I totally agree with the issue. One small note: as the type name t is represented as a literal atom in erl_parse:abstract_form any location or other annotation is lost so it cannot have proper location as you describe in expected behaviour. At least not without info from the tokens or modifying the parser. (This is also true for function name in a spec or callback) This is a bummer :(

garazdawi commented 3 years ago

Hello!

This is something that we also want to solve, but it will take some time to come up with a solution that has the right balance between backward compatibility and features.

One of the issues here is that the current parser has always been mainly aimed for consumption by the compiler, so the features in it are geared towards that. It could be that another parser more geared towards other tools could serve better for formatters and language servers.

That being said, we do think that adding more location information into attributes for erl_parse should be possible. It will however not be something that will be done soon.

erlang / otp