erlef / documentation-wg

EEF Documentation Working Group

9 stars 1 forks source link

Translate Erlang/OTP XML into EEP 48 docs chunk #3

Open josevalim opened 5 years ago

josevalim commented 5 years ago

This is a thread to track the progress and any pending tasks that may arise during this effort.

The current idea is that Erlang/OTP XML will be converted to an HTML tree data-structure. This is better than storing HTML as text as we don't have to parse it again.

The tree will have the following format:

Ast :: [Node]
Node :: binary() | {Tag, Attributes, Ast} | {Tag, Ast}
Tag :: binary()
Attributes :: [{binary(), binary()}]

Note: We may remove {Tag, Ast} as it can be represented as {Tag, [], Ast}. The question is if the size will increase considerably or not by doing so.

Note: Currently, EEP 48 requires documentation to be stored as binaries. This means that the HTML tree would have to be converted as term_to_binary and then the whole chunk is converted to term_to_binary again. This is most likely OK, converting a binary with term_to_binary is a fast operation with minimum space increase, but worth noting. The mime type should be something "application/erlang-abstract-format+html".

You can track the current progress here.

TODO

[ ] Merge multipe for the same arity.
[ ] Have a look at specs and types to see if we can do something better with them
[ ] Check what has/can be done to ease lsp integration.
[x] Skip wild HTML elements when rendering
[x] Fix all links to work properly. This will include changes to the source xml as right now we cannot see any difference between a link to a User's Guide and to a Reference Manual.
[x] Remove nested 
[x] Make all attributes binaries
[x] Add pagianation for h/1 and friends, see c:i/0 for details
[x] Add link type checker
[x] Move <section> in the refman xml to only be allowed inside <description>
[x] Add callbacks to the chunks
[x] Fix duplicate function documentation when doing h(ttb,tpl)
[x] Add deprecation attributes to the meta-data for functions
[x] Fix , and . alone on new-line when after a tagged string.
[x] Document the application/erlang+html format and the EEP-48 format.
[x] Expand the shell_docs:validate/1 function to follow a stricter dtd.
[x] Add proper tests for shell_docs:render/* by using the dtd.

Don't do

[ ] Forbid multiple <funcs> tags in the source xml
[ ] Wrap all <li> contents with a  (don't think I will do this as it will change the meaning of things)

wojtekmach commented 5 years ago

I'd like to propose changing:

Attributes :: [{binary(), binary()}]

to:

Attributes :: #{binary() => binary()}

I think this would be more friendly to tooling as we we could easily pattern match on subset of attributes. I believe the order of the attributes shouldn't matter and duplicates shouldn't be allowed (e.g. xmerl does not allow duplicates)

KennethL commented 5 years ago

@wojtekmach yes I agree that a map would be more efficient for tooling here and I don't think it will increase the size of the doc chunk.

marianoguerra commented 5 years ago

I already have a poc for this, I'm on vacations this week but the next one I can put together a status report and next steps.

On Fri, 15 Nov 2019, 16:32 Kenneth Lundin, notifications@github.com wrote:

@wojtekmach https://github.com/wojtekmach yes I agree that a map would be more efficient for tooling here and I don't think it will increase the size of the doc chunk.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/erlef/eef-documentation-wg/issues/3?email_source=notifications&email_token=AAAQW37VS7MGMQIDPJ5WF7LQT2XIHA5CNFSM4JMWEBS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEFS7YQ#issuecomment-554381282, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQW3YEFI2PK5XH3J6SYF3QT2XIHANCNFSM4JMWEBSQ .

marianoguerra commented 5 years ago

To be more specific: I have a project that reads OTP xml docs and converts it to a tree data structure that then is translated to html, markdown and restructured text

On Fri, 15 Nov 2019, 19:44 Mariano Guerra, mariano@marianoguerra.org wrote:

I already have a poc for this, I'm on vacations this week but the next one I can put together a status report and next steps.

On Fri, 15 Nov 2019, 16:32 Kenneth Lundin, notifications@github.com wrote:

@wojtekmach https://github.com/wojtekmach yes I agree that a map would be more efficient for tooling here and I don't think it will increase the size of the doc chunk.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/erlef/eef-documentation-wg/issues/3?email_source=notifications&email_token=AAAQW37VS7MGMQIDPJ5WF7LQT2XIHA5CNFSM4JMWEBS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEFS7YQ#issuecomment-554381282, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQW3YEFI2PK5XH3J6SYF3QT2XIHANCNFSM4JMWEBSQ .

KennethL commented 4 years ago

I have been thinking about the format of the doc chunk for Erlang for some time and during that I have also written a program which translate the OTP XML into the doc chunk. As stated before I propose that the doc chunk have a structured term format and not a binary as stated in the EEP 48. Of course it is possible to convert it to a binary with term_to_binary but I don't see the point in not allowing the term as is. The format is anyway tagged with something like "application/erlang-doc-format+html". The tricky part is to decide the exact format and I think it must be an iterative process during the development of the translator and the consumers of the format. I see the consumers as:

a rendering function for producing a text printout in the shell (help M F A)
Exdoc generating html and epub

@josevalim wrote The tree will have the following format:

Ast :: [Node]
Node :: binary() | {Tag, Attributes, Ast} | {Tag, Ast}
Tag :: binary()
Attributes :: [{binary(), binary()}]

But during my work and the comment from @wojtekmach I think I would like to do a minor change to this:

Ast :: [Node]
Node :: binary() | {Tag, Attributes, Ast} | {Tag, Ast}
Tag :: atom()
Attributes :: #{atom() => term()}]

I don't really see a problem with having the Tag and the key for attributes as an atom() vs. a binary() but this is a detail. The more important part is what tags there are. I have worked with the idea of having the well known html names when possible. A complete example will come soon.

My idea is that we extend the OTP build with producing the doc chunks in addition to the current html, man and pdf so that anyone can build OTP with doc chunks (optional). That version should also contain API functions for fetching the doc chunks and presenting them in the shell. I hope we can have this already in OTP 23 (May) at least as an experimental feature as all the formats and API's might not be 100% settled

erszcz commented 4 years ago

@KennethL That's great news!

Are pull requests to EDoc based on the work done so far welcome?

josevalim commented 4 years ago

As stated before I propose that the doc chunk have a structured term format and not a binary as stated in the EEP 48.

Agreed. I will send a PR to update EEP 48. I propose, however, that the format must then end with +erlang, so it is relatively easy for tools to detect they have an Erlang term in there. Is that ok?

I don't really see a problem with having the Tag and the key for attributes as an atom() vs. a binary() but this is a detail.

Thanks for the updates! Our current plan is to change ExDoc to work on HTML ASTs. So for Elixir, ExDoc will convert the Markdown to a HTML tree and then traverse it. For Erlang, we will work directly with the tree stored in the chunk. It may be that the tools (markdown processor and Erlang) do not agree on the format of the tree, so we will need some pre-processing. Furthermore, we want the simplest possible tree while processing, binary() | {Tag, Attributes, Ast}, but in the interest of storage, having a {Tag, Ast} is nice, especially if Attributes is a map instead of a list.

Using a map instead of list for attributes also means we lose ordering. Once again, this is probably not a problem for Erlang, but a Markdown processor may want to keep the user ordering, and therefore they are forced to use a list.

In other words, I think it is fine to go with your proposal. Realistically speaking, it is most likely that ExDoc will have to normalize those trees. Even if they have the same AST type, I wouldn't be surprised if certain representation are sometimes different.

My idea is that we extend the OTP build with producing the doc chunks in addition to the current html, man and pdf so that anyone can build OTP with doc chunks (optional).

Should we make the building of docs chunk enabled by default? I think this would be the simplest to have the docs accessible to everyone.

KennethL commented 4 years ago

@KennethL That's great news!

Are pull requests to EDoc based on the work done so far welcome?

Yes puul request for Edoc are welcome, but try to make them compatible so that current usage is not broken, I assume even the not so nice layout it produces today will have to be kept for a while. But for the output there are support for plugins which can produce whatever format. These plugins don't even need to be part of Edoc, but in this case I think we want it to be part of Edoc.

KennethL commented 4 years ago

As stated before I propose that the doc chunk have a structured term format and not a binary as stated in the EEP 48.

Agreed. I will send a PR to update EEP 48. I propose, however, that the format must then end with +erlang, so it is relatively easy for tools to detect they have an Erlang term in there. Is that ok?

Thinking again about if format (in EEP-48) should be a binary or a structured term, a binary with the term in external format might have its advantages when using online help where the doc for only one function at a time is requested. It would then be less computation to fetch the whole chunk and then convert only the doc for one function with binary_to_term. What do you think?

I don't really see a problem with having the Tag and the key for attributes as an atom() vs. a binary() but this is a detail.

Thanks for the updates! Our current plan is to change ExDoc to work on HTML ASTs. So for Elixir, ExDoc will convert the Markdown to a HTML tree and then traverse it. For Erlang, we will work directly with the tree stored in the chunk. It may be that the tools (markdown processor and Erlang) do not agree on the format of the tree, so we will need some pre-processing. Furthermore, we want the simplest possible tree while processing, binary() | {Tag, Attributes, Ast}, but in the interest of storage, having a {Tag, Ast} is nice, especially if Attributes is a map instead of a list.

Using a map instead of list for attributes also means we lose ordering. Once again, this is probably not a problem for Erlang, but a Markdown processor may want to keep the user ordering, and therefore they are forced to use a list.

Having a map for the attributes instead of a list is not super important for me. If the order sometimes is important we can keep it as a list.

In other words, I think it is fine to go with your proposal. Realistically speaking, it is most likely that ExDoc will have to normalize those trees. Even if they have the same AST type, I wouldn't be surprised if certain representation are sometimes different.

My idea is that we extend the OTP build with producing the doc chunks in addition to the current html, man and pdf so that anyone can build OTP with doc chunks (optional).

Should we make the building of docs chunk enabled by default? I think this would be the simplest to have the docs accessible to everyone.

Yes I think building docs shunks could be default, but we have to decide if the chunk goes into the .beam file or if it goes into a separate file per module. If it goes into the .beam file I think we have to integrate is more with the ´erlc´ command and the compiler or of course it could be an extra separate pass where docs chunks are added to the .beam files. I am not sure what I think yet, we have to discuss it more in the OTP team (will probably happen this week). Do you have any input on that?

wojtekmach commented 4 years ago

Having a map for the attributes instead of a list is not super important for me. If the order sometimes is important we can keep it as a list.

I was the one proposing the attributes map but I agree that sticking with a list might be better after all.

Yes I think building docs chunks could be default, but we have to decide if the chunk goes into the .beam file or if it goes into a separate file per module.

My understanding is for modules that are part of erts there's no beam file so in order to eventually do: erl> h(erlang)., erl> h(atomics). etc we'd need the chunk somewhere else, e.g. in doc/chunks/erlang.chunk per EEP 48.

Perhaps OTP stores erts docs in .chunk but other apps docs in .beam?

If it goes into the .beam file I think we have to integrate is more with the ´erlc´

I think this is the most appealing option: when working in rebar3/mix/etc projects when using dependencies they would be compiled with docs with no extra work (maybe just a compiler flag to enable chunks generation) and thus the documentation would be immediately available. Otherwise, if it's part of edoc, then we'd need to run edoc for all deps (which could of course be automated somehow).

josevalim commented 4 years ago

Thinking again about if format (in EEP-48) should be a binary or a structured term, a binary with the term in external format might have its advantages when using online help where the doc for only one function at a time is requested. It would then be less computation to fetch the whole chunk and then convert only the doc for one function with binary_to_term. What do you think?

I think both options are ok. Having it as a binary is definitely more consistent and it doesn't require changing EEP-48. If it has actual performance benefits in some situations, then even better!

If it goes into the .beam file I think we have to integrate is more with the ´erlc´

My initial question is which docs would be integrated into erlc. We could integrate edoc, but it would require extra-passes, and we are not sure we want to stick with edoc anyway. erl_docgen is mostly specific to Erlang/OTP so maybe it is not worth integrating it either. So maybe it is best to stick with separate .chunk files for now, due to the reasons given by Wojtek, and we can worry about tightly integrating with erlc in the future?

erszcz commented 4 years ago

maybe it is best to stick with separate .chunk files for now, due to the reasons given by Wojtek, and we can worry about tightly integrating with erlc in the future?

I was also going to suggest that a separate pass might make sense when the feature is still experimental, given the pass is different for OTP and non-OTP modules. For building project releases that include OTP libs some care has to be taken not to unintentionally skip the separate .chunk files.

garazdawi commented 4 years ago

Hello,

For the last week or so I've been working on making this a reality. You can track my progress here: https://github.com/garazdawi/otp/tree/lukas/kernel/code-chunk-lookup

So far I've implemented help for all functions and types in all of Erlang/OTP.

Some notes:

The documentation is stored in .chunk files for all modules.
It is generated when the documentation is built, not when the modules are compiled. So to build you need to do make && make docs
There is a function called code:get_doc/1 that fetches the documentation from either a beam chunk or the .chunk file.

Here is an example of what is in the chunk:

22> rp(shell_docs:get_doc(erlang,timestamp,0)).
[{{function,timestamp,0},
  0,
  [<<"timestamp/0">>],
  #{<<"en">> =>
        [{p,[],[<<"Current Erlang System time.">>]},
         {type,[],[{v,[{name,"timestamp"}],[]}]},
         {p,[],
            [<<"Returns current ">>,<<"Erlang system time">>,
             <<" on the format ">>,
             {c,[],[<<"{MegaSecs, Secs, MicroSecs}">>]},
             <<". This format is the same as ">>,<<>>,
             {c,[],[<<"os:timestamp/0">>]},
             <<" and the deprecated ">>,<<>>,
             {c,[],[<<"erlang:now/0">>]},
             <<" use. The reason for the existence of ">>,
             {c,[],[<<"erlang:timestamp()">>]},
             <<" is purely to simplify use for existing code that assumes this time stamp format. Current Erlang system time can more efficiently be retrieved in the time unit of your choice using ">>,
             <<>>,
             {c,[],[<<"erlang:system_time/1">>]},
             <<".">>]},
         {p,[],
            [<<"The ">>,
             {c,[],[<<"erlang:timestamp()">>]},
             <<" BIF is equivalent to:">>]},
         {code,
             [{type,"none"}],
             [<<"timestamp() ->\n    ErlangSystemTime = erlang:system_time(microsecond),\n    MegaSecs = ErlangSystemTime div 1000000000000,\n    Secs = ErlangSystemTime div 1000000 - MegaSecs*1000000,\n    MicroSecs = ErlangSystemTime rem 1000000,\n    {MegaSecs, Secs, MicroSecs}.">>]},
         {p,[],
            [<<"It, however, uses a native implementation that does not build garbage on the heap and with slightly better performance.">>]},
         {note,[],
             [<<>>,
              {p,[],
                 [<<"This time is ">>,
                  {em,[],[<<"not">>]},
                  <<" a monotonically increasing time in the general case. For more information, see the documentation of ">>,
                  <<"time warp modes">>,<<" in the User's Guide.">>]},
              <<>>]}]},
  #{spec =>
        [{attribute,1549,spec,
             {{erlang,timestamp,0},
              [{type,1549,bounded_fun,
                   [{type,1549,'fun',
                        [{type,1549,product,[]},{var,1549,'Timestamp'}]},
                    [{type,1550,constraint,
                         [{atom,1550,is_subtype},
                          [{var,1550,'Timestamp'},
                           {user_type,1550,timestamp,[]}]]}]]}]}}]}}]

And here it is when rendered:

20> h(erlang,timestamp,0).

-spec erlang:timestamp() -> Timestamp when Timestamp :: timestamp().

Current Erlang System time.

Types:
  -type timestamp() ::
          {MegaSecs :: non_neg_integer(),
           Secs :: non_neg_integer(),
           MicroSecs :: non_neg_integer()}.

Returns current Erlang system time on the format {MegaSecs, Secs, 
MicroSecs}. This format is the same as os:timestamp/0 and the 
deprecated erlang:now/0 use. The reason for the existence of 
erlang:timestamp() is purely to simplify use for existing code that 
assumes this time stamp format. Current Erlang system time can more 
efficiently be retrieved in the time unit of your choice using 
erlang:system_time/1.

The erlang:timestamp() BIF is equivalent to:

  timestamp() ->
      ErlangSystemTime = erlang:system_time(microsecond),
      MegaSecs = ErlangSystemTime div 1000000000000,
      Secs = ErlangSystemTime div 1000000 - MegaSecs*1000000,
      MicroSecs = ErlangSystemTime rem 1000000,
      {MegaSecs, Secs, MicroSecs}.

It, however, uses a native implementation that does not build garbage 
on the heap and with slightly better performance.

Note:
  This time is not a monotonically increasing time in the general 
  case. For more information, see the documentation of time warp modes 
  in the User's Guide.

Remaining issues:

The rendering is not perfect yet, most significantly there are whitespaces that should not be there that are emitted.
At the moment there are duplicate entries in the docs list for some functions. For instance erlang:system_info/1 has 8 entries, one for each <func> tag in the source xml. I don't have any good idea about how to store this type of documentation in the EEP-48 chunks. The best I can some up with is to modify {{Kind,Name,Arity},Anno,Sig,Doc,Meta} to {{Kind,Name,Arity},[{Anno,Sig,Doc,Meta}].
Since is not implemented because I don't know what to do about functions like this where separate clauses have different sinces. Same for deprecations...
Links and cross-references are not really done as the shell rendered does not need them.
Some of the tags need to be converted to better/other names. You can view the current names here: https://github.com/garazdawi/otp/blob/ae7c40923550d4021e31c70e6fa0e32f9b3bd64b/lib/stdlib/src/shell_docs.erl#L35. Specifically v and d, but maybe also note, warning and others.
Add a version to the mimetype so that we know what we are working with when rendering.

josevalim commented 4 years ago

@garazdawi fantastic!

The best I can some up with is to modify {{Kind,Name,Arity},Anno,Sig,Doc,Meta} to {{Kind,Name,Arity},[{Anno,Sig,Doc,Meta}].

I agree. We may need to bump the version of the format.

Some of the tags need to be converted to better/other names

You probably need to decide if you are going to emit HTML or a HTML-like tree. Have you had any thoughts about this?

garazdawi commented 4 years ago

You probably need to decide if you are going to emit HTML or a HTML-like tree. Have you had any thoughts about this?

I think it will have to be HTML-like. Some of the things I do now could be translated. Not sure how easy it would be to do with the <type> tag though... maybe it is worth an attempt to do and the just add some extra attributes to the HTML tags for the renderer to use.

What I'm trying to avoid is for different renderers to have to do very complex parsing loops in order to figure out what to print.

josevalim commented 4 years ago

@garazdawi is that the tag that processes the spec metadata? So it seems it can be added anywhere in the text? Would you a lot if it always appears at the top, before any of the paragraphs? I assume there was a reason why it was originally done as so.

garazdawi commented 4 years ago

@garazdawi is that the tag that processes the spec metadata?

No. It is the tag that either described what a type the function uses looks like (e.g. https://github.com/erlang/otp/blob/master/lib/asn1/doc/src/asn1ct.xml#L78-L89) or the name of the type if it is part of the code (e.g. https://github.com/erlang/otp/blob/master/lib/stdlib/doc/src/calendar.xml#L135-L138).

The spec metadata is rendered instead of the signature when present. I decided to not put the rendered version of the spec in the signature as the spec format is more flexible and easier to create links etc from.

garazdawi commented 4 years ago

Would you a lot if it always appears at the top, before any of the paragraphs? I assume there was a reason why it was originally done as so.

It always appears at the top at the moment. Or rather just after the <fsummary>, which is translated to the first  in each function. Thinking some more about it maybe I could translate the types into an <ul> at the top.

josevalim commented 4 years ago

The spec metadata is rendered instead of the signature when present. I decided to not put the rendered version of the spec in the signature as the spec format is more flexible and easier to create links etc from.

And to clarify, the spec follows the same Erlang AST format, right? We should probably document this in EEP 48 too.

Thinking some more about it maybe I could translate the types into an
at the top.

Sounds like a good call to me. <ul> with each entry using the proper code quotation.

garazdawi commented 4 years ago

And to clarify, the spec follows the same Erlang AST format, right?

Yes.

erszcz commented 4 years ago

The best I can some up with is to modify {{Kind,Name,Arity},Anno,Sig,Doc,Meta} to {{Kind,Name,Arity},[{Anno,Sig,Doc,Meta}].

I agree. We may need to bump the version of the format.

Just an idea looking at the problem from a different angle - since functions in Erlang are actually uniquely identified by their M:F/A this should not be a frequent problem. Maybe the few places where this occurs could be rewritten to fit into the current {{Kind,Name,Arity},Anno,Sig,Doc,Meta} format? I'm not sure if that's feasible, so please treat this just as food for thought.

garazdawi commented 4 years ago

There are 58 places (unless my grep skills fail me) in the docs where this is feature is used. So it could be done. However, in my opinion, the (HTML) docs become easier to navigate by using the method and I don't really see an alternative notation that would work as well.

I'll leave that part to rest for a while and focus on other parts of the doc chunks first and then come back to it. Maybe we'll have an epiphany somewhere along the way.

garazdawi commented 4 years ago

Pushed an update where types are like this:

{ul,[{class,"types"}],[{li,[{name,"timestamp"}],[]}]},

for types that are derived from the metadata. And like this:

{ul,[{class,"types"}],
     [{li,[{class,"type"}],[<<"MatchDesc = [MatchNum]">>]},
      {li,[{class,"type"}],
          [<<"MatchNum = {matched, node(), integer()} | {matched, node(), 0, RPCError}">>]},
      {li,[{class,"type"}],[<<"RPCError = term()">>]}]},

for inline type desciptions.

Also I changed note,warning,doi,dont to be a p with a class. e.g.

{p,[{class,"note"}],
    [<<>>,
     {p,[],
         [<<"This time is ">>,
          {em,[],[<<"not">>]},
          <<" a monotonically increasing time in the general case. For more information, see the documentation of ">>,
          <<"time warp modes">>,<<" in the User's Guide.">>]},
     <<>>]}]}

That leaves type_desc as the only non-html tag.

garazdawi commented 4 years ago

I plan to open a PR for my changes tomorrow. There are still some issues that are unresolved, but they may have to be fixed later as OTP 23-rc1 is set to be released soon and I want this to be part of that.

josevalim commented 4 years ago

Fantastic! If there is anything I can do or if you want to jump on a call to discuss the unresolved issues, please let me know!

garazdawi commented 4 years ago

The main unresolved issue is still what to do with the multiple function definitions. I'm starting to lean towards actually just re-writing the docs to not use that feature any more...

One thing that is left is to use a second renderer from the chunk sources. Right now I've only tested with my own shell_docs renderer, but as far as I understand there is a vision to also use LSP and ExDoc to generate docs from these chunks. It would be great to know if there is anything that needs to be adapted in the format in order to make that easier.

josevalim commented 4 years ago

The main unresolved issue is still what to do with the multiple function definitions. I'm starting to lean towards actually just re-writing the docs to not use that feature any more...

My current thinking is that EEP 48 will become more robust if we change it to support multiple entries. For Erlang and Elixir it boils down to an implementation detail but I can imagine a language in the future that may want to document each of them individually, especially statically typed languages. In the worst case scenario, if we don't want to use the feature, it just ends up being a one item list. :)

But if you think it is best to save this fight for later when the need arises, then that's ok by me too!

Once the PR is available, we will start looking into giving it a try on ExDoc and give you some feedback.

KennethL commented 4 years ago

As discussed earlier we should also have a plugin for edoc to produce the same format as we now are producing from the OTP xml. Is there anyone working on that part?

I would like to have it as a PR to include in OTP and as a pure extension, leaving the old functionality of edoc compatible (as much as possible).

Kenneth

On Tue, Feb 18, 2020, 19:38 José Valim notifications@github.com wrote:

The main unresolved issue is still what to do with the multiple function definitions. I'm starting to lean towards actually just re-writing the docs to not use that feature any more...

My current thinking is that EEP 48 will become more robust if we change it to support multiple entries. For Erlang and Elixir it boils down to an implementation detail but I can imagine a language in the future that may want to document each of them individually, especially statically typed language. In the worst case scenario, if we don't want to use the feature, it just ends up being a one item list. :)

But if you think it is best to save this fight for later when the need arises, then that's ok by me too!

Once the PR is available, we will start looking into giving it a try on ExDoc and give you some feedback.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/erlef/eef-documentation-wg/issues/3?email_source=notifications&email_token=AABFWSHI2MO6NQXJOVOYFPLRDQTL7A5CNFSM4JMWEBS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMDI7ZY#issuecomment-587632615, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABFWSFZ2T6RRYDWJXS4A3LRDQTL7ANCNFSM4JMWEBSQ .

josevalim commented 4 years ago

@KennethL, @erszcz is interested in doing that and we even have some funds reserved for it. I think the main blocker was the internal representation choice for the Erlang/OTP XML and now that is mostly settled, we can resume the Edoc work. So once this is in, I see two different venues progressing at the same time:

ExDoc integration with Erlang
EDoc integration with EEP 48

I will schedule a working group meeting because if we are going to request funds, we need to convene and agree on the terms.

erszcz commented 4 years ago

@KennethL Indeed, as @josevalim says I'm working on the EDoc part of things. My last update is here and, while not as fully fledged as the OTP counterpart by Lukas, it is functional already. I'm keeping in mind the that we want to preserve the current functionality as well.

josevalim commented 4 years ago

Hi @garazdawi and @KennethL,

I have added support for showing Erlang docs from Elixir. Here is some feedback and some bugs that I have found the in the process:

When there is a note/warning/etc, there is a text. My understanding is that you cannot have a paragraph inside a paragraph. Maybe the outer p should be a div?
Attribute values: the attribute values are charlists (Erlang strings) today. For example, the p above would be represented as {p, [{class, "warning"}], [{p, [], [<<"text">>]}]}, but given everything else is a binary, maybe attribute values should be binaries too?
Today, both gen_server:call/2 and gen_server:call/3 have an entry in the chunk for them, and the documentation is precisely the same. This leads to an issue where tools like ExDoc will end-up showing this documentation twice. This issue even happens in Erlang's h(gen_server, call). Elixir solves this by adding a defaults metadata entry, where defaults > 0. So for example, we document GenServer.call/3 and it has default of 1, so we know it can handle arities /2 and /3. defaults doesn't make much sense for Erlang though, so we should probably find a way to standardize this. A simple option may be to store extra_arities which is a list of extra arities. In this case, it would be extra_arities: [2]. Thoughts?
Today specs behave a bit awkward. They are wrapped in <ul class="spec"><li> but they don't behave like list items. I know this is something we already mentioned but I am bringing it up for completeness. My suggestion would be to wrap them on something like <pre><code class="specs"> or similar and already have them pre-formated.
Today <li> can either have text directly in them or it can have paragraphs with all sorts of things in it. This is totally fine and according to the HTML standard. However you may consider on always wrapping it in a paragraph if you think it will make life easier for users.

This is it! I am really excited about this and the possibilities it opens up!

garazdawi commented 4 years ago

My understanding is that you cannot have a paragraph inside a paragraph.

You can have 's within 's. I didn't put the text within a  because it made the rendering for shell_docs harder. But if it makes it harder for everyone else I can fix it in shell_docs.

Attribute values: the attribute values are charlists (Erlang strings) today.

I did that out of old habit and because I like my strings that way :) Will change it.

Today, both gen_server:call/2 and gen_server:call/3 have an entry in the chunk for them, and the documentation is precisely the same.

Yes, I'm aware of this. My idea at the moment was to mark different function signatures with equal docs to be duplicate entries somehow in the metadata.

Today specs behave a bit awkward.

I assume that you mean types?

My suggestion would be to wrap them on something like
 or similar and already have them pre-formated.


I don't want the generator to render any spec or types as that takes away information about the types that I think would be useful to renderers.

However you may consider on always wrapping it in a paragraph if you think it will make life easier for users.

I did this, but it caused problems with whitespace rendering in the early stages of the prototype. I can check if that is still the case.


            
            
                josevalim
                commented
                 4 years ago            
            
                
You can have <p>'s within <p>'s. 

My understanding is that p is paragraph content and nesting is not allowed. You can try it here by choosing "text field" in the select button on the left and then adding nested <p>s. You can also see the relevant part in the spec. So if we were to simply output nested <p> in ExDoc, the final markup would be invalid. Replacing the outer p by a div or a single p are both fine by me, so whatever makes it easy in the shell_docs case.

Yes, I'm aware of this. My idea at the moment was to mark different function signatures with equal docs to be duplicate entries somehow in the metadata.

So we have two options:


Repeat the docs and use metadata to exclude. This makes the chunk larger. When doing h(gen_server, call) and tools like ExDoc now need to filter out duplicate content based on the metadata. Looking up by h(gen_server, call, 2) is simple though.


Annotate that one function represents other arities, skipping the doc for other arities. The chunk is smaller. h(gen_server, call) just works, h(gen_server, call, 2) though is slightly more complicated.


We have 2 in Elixir. What I have implemented for 2 is that when looking up for gen_server:call/2, we first lookup for gen_server:call, and then get the first entry with equal arity or with the represented extra_arities/default in the metadata.
Once we pick an approach, let's document it in the EEP 48 too. :)

I assume that you mean types?

Yes, I did. It makes sense to keep that formatting then, please ignore my suggestion. :)

I did this, but it caused problems with whitespace rendering in the early stages of the prototype.

Perfect, please ignore my suggestion here as well then.

FWIW, here is the implementation in Elixir: https://github.com/elixir-lang/elixir/pull/9843/files#diff-3be1f4efa35778ff14e22d1f57edad59R124 - I rolled our own traversal instead of using "shell_docs" so we use the same coloring and representation as in Elixir docs.
            
        
            
            
                garazdawi
                commented
                 4 years ago            
            
                
My understanding is that p is paragraph content and nesting is not allowed.

I misunderstood you. I thought that you were referring to the content in the erlang doc chunks, not in HTML in general.

You can also see the relevant part in the spec.

I did not know that this was not allowed... I should be able to move most of it to divs without too much trouble.
            
        
            
            
                wojtekmach
                commented
                 4 years ago            
            
                Not sure if this was brought up before, I wonder what would be a good place to document the application/erlang+html format (basically what we have at the beginning of the issue + subsequent revisions). My proposal would be to include it in EEP 48 before OTP 23.0 is out and I'd be happy to help with that.
            
        
            
            
                garazdawi
                commented
                 4 years ago            
            
                I added the start of some documentation in erl_docgen's User's Guide. My plan was to finish that work by rc2. If you want to help you are more than welcome. I'll be away for about a week so will not be able to help you much until then.
            
        
            
            
                garazdawi
                commented
                 4 years ago            
            
                I've updated the top post in this issue with my current TODO list which I hope I will get done before OTP 23-rc2.
            
        
            
            
                garazdawi
                commented
                 4 years ago            
            
                You can track the current progress here.
@erszcz I've fixed some of the problems that you mentioned with how recon was rendered.
            
        
            
            
                garazdawi
                commented
                 4 years ago            
            
                I seem to remember having read somewhere a suggestion of how links should be represented in erlang+html, but can't find it now.
My current implementation has it like this:
<seemfa marker="stdlib:lists#foldl/3">lists:foldl/3</seemfa>
  becomes
{a,[{href,"stdlib:lists#foldl/3"},
    {rel,"https://erlang.org/doc/link/seemfa"}],
   [<<"lists:foldl/3">>]}]}
If anyone has any better suggestions I'm all ears.
            
        
            
            
                josevalim
                commented
                 4 years ago            
            
                :+1:! What about links to callbacks and types?
            
        
            
            
                garazdawi
                commented
                 4 years ago            
            
                There is seetype for types, and currently seemfa for callbacks.
There is also seeguide, seefile, seecom, seecref, seeerl.
            
        
            
            
                wojtekmach
                commented
                 4 years ago            
            
                
and currently seemfa for callbacks.

is this something like "stdlib:gen_server#Module:handle_call/3"?
is there one for modules?
            
        
            
            
                garazdawi
                commented
                 4 years ago            
            
                
is this something like "stdlib:gen_server#Module:handle_call/3"?

yes.

is there one for modules?

<seeerl marker="stdlib:lists">lists(3)</seeerl>
  becomes
{a,[{href,"stdlib:lists"},
    {rel,"https://erlang.org/doc/link/seeerl"}],
   [<<"lists(3)">>]}]}
            
        
            
            
                garazdawi
                commented
                 4 years ago            
            
                An open question is whether I should generate relative or absolute links in the chunk. i.e. should #foldl/3 be allowed, or should it always be expanded to stdlib:lists#foldl/3. Any thoughts?
            
        
            
            
                josevalim
                commented
                 4 years ago            
            
                @garazdawi the amount of work to expand a relative link to a full link or to get a potential relative link from a full link is roughly the same, so I would generate full links to simplify the format.
            
        
            
            
                wojtekmach
                commented
                 4 years ago            
            
                I noticed that e.g. http://erlang.org/doc/man/c.html#c-2 refers to compile:option/0 type using this URL http://erlang.org/doc/man/compile.html#type-option, but the type is not in the chunk docs for that module:
> {ok, Ch} = code:get_doc(compile), [element(1, Doc) || Doc <- element(7, Ch)].
[{function,env_compiler_options,0},
 {function,file,1},
 {function,file,2},
 {function,forms,1},
 {function,forms,2},
 {function,format_error,1},
 {function,output_generated,1},
 {function,noenv_file,2},
 {function,noenv_forms,2},
 {function,noenv_output_generated,1}]
it is in however in chunk metadata:
> {ok, Ch} = code:get_doc(compile), maps:keys(maps:get(types, element(6, Ch))).
[{abstract_code,0},
 {bin_ret,0},
 {comp_ret,0},
 {err_info,0},
 {err_ret,0},
 {err_warn_info,0},
 {errors,0},
 {forms,0},
 {mod_ret,0},
 {option,0},
 {warnings,0}]
should it be in the chunks docs with e.g. Doc = none, Doc = hidden, etc?
            
        
            
            
                garazdawi
                commented
                 4 years ago            
            
                No. Even if it is named "type-" it is not actually a documented type, it is just a link to a "<a name>" marker.
            
        
            
            
                garazdawi
                commented
                 4 years ago            
            
                hmm, on second thought... maybe it should be... need to dig more.
            
        
            
            
                garazdawi
                commented
                 4 years ago            
            
                So, type links are apparently a mess... I will try to make something better...
            
        
            
            
                essen
                commented
                 4 years ago            
            
                
I seem to remember having read somewhere a suggestion of how links should be represented in erlang+html, but can't find it now.
My current implementation has it like this:
<seemfa marker="stdlib:lists#foldl/3">lists:foldl/3</seemfa>
  becomes
{a,[{href,"stdlib:lists#foldl/3"},
    {rel,"https://erlang.org/doc/link/seemfa"}],
   [<<"lists:foldl/3">>]}]}
If anyone has any better suggestions I'm all ears.

Probably https://github.com/erlang/otp/pull/2545#issuecomment-589181505
Your links look good now!
            
        
    
    
                    Next
            

    
        
            ©  Githubissues.
            Githubissues is a development platform for aggregating issues.