erlef / documentation-wg

EEF Documentation Working Group
9 stars 1 forks source link

Generate ExDoc documentation for Erlang projects #4

Open erszcz opened 5 years ago

erszcz commented 5 years ago

This thread documents the progress on generating ExDoc documentation for Erlang projects.

Current status

erszcz/edoc@0c81cea1b659ad9fd00993d25c9de9293c3651b0 can be used to generate chunks compatible with erlang/otp@dd3b0157d2b20adacd98fea64b3af443b6639b9a as follows:

git clone https://github.com/ferd/recon
cd recon
cat >> rebar.config <<END

{plugins,
 [
  {rebar3_edoc_chunks, {git, "https://github.com/erszcz/edoc.git", {branch, "master"}}}
 ]}.

{provider_hooks,
 [
  {post, [{compile, {edoc_chunks, compile}}]}
 ]}.
END
rebar3 compile
ls _build/default/lib/recon/doc/chunks/

ToDo

2020-04-01 update:

erszcz commented 5 years ago

Code blocks and inline code formatting is back.

gomoripeti commented 5 years ago

I tried docs_chunks with wm-erlang branch of ex_doc (without rebar3) on an erlang app. All worked fine with a minor modification: I had to add config.source_dir to the code path otherwise ExDoc.Retriever.docs_from_files couldn't load the module (although it has full path to the beam)

erszcz commented 5 years ago

@gomoripeti Thanks for the feedback! The EDoc fork and docs_chunks are now separate codebases. Since the new formatter required some changes in (now called) edoc_chunks, could you give the rebar plugin (and therefore edoc_chunks) a go to see if it still behaves properly on your app?

Moreover, as far as I understand @wojtekmach is not really interested in supporting docs_chunks in the long term, while the EDoc fork has a chance of eventually landing in OTP.

wojtekmach commented 5 years ago

Yeah, I consider docs_chunks a workaround and while I'm happy to accept bug fixes etc, I don't plan to extend it beyond what it is so far. I'm also happy to keep it around to test some ideas out. Long term, looking forward to have chunks generation upstream in OTP in edoc or erlc.

erszcz commented 5 years ago

For the record:

ExDoc docs are generated properly for docsh, but are not for edoc itself.

ExDoc expects a -type attribute to be present in Dbgi if a @type EDoc tag of the same name is present in Docs. This causes it to crash on edoc.erl:

** (MatchError) no match of right hand side value: nil
    (ex_doc) lib/ex_doc/retriever.ex:416: ExDoc.Retriever.get_type/3
    (ex_doc) lib/ex_doc/retriever.ex:406: anonymous fn/4 in ExDoc.Retriever.get_types/2
    (elixir) lib/enum.ex:1948: Enum."-reduce/3-lists^foldl/2-0-"/3
    (ex_doc) lib/ex_doc/retriever.ex:405: ExDoc.Retriever.get_types/2
    (ex_doc) lib/ex_doc/retriever.ex:134: ExDoc.Retriever.do_generate_node/3
    (ex_doc) lib/ex_doc/retriever.ex:120: ExDoc.Retriever.generate_node/3
    (elixir) lib/enum.ex:2994: Enum.flat_map_list/2
    (ex_doc) lib/ex_doc/retriever.ex:43: ExDoc.Retriever.docs_from_modules/2
wojtekmach commented 5 years ago

I vaguely remember reading somewhere (but don't quote me on this) that writing typespecs is preferred over writing edoc typespec tags. If that's correct, I'd focus on that scenario and maybe even warn when edoc typespec is found. And if that's not correct, I'd be curious to learn about that use case. In any case, I made a note of this in my internal ExDoc list.

KennethL commented 5 years ago

I think we have to decide what solutions and what formats we are going for before we join our forces and implement the things we agreed on.

@josevalim and I have had several discussions after summer about what problem to solve and how. @wojtekmach showed a very nice proof of concept about what at least I think we should go for. Below is my summary of that:

We want to use the doc chunk format specified in EEP-48 as basis for a standardized way of making documentation available for use in the interactive shell and for use via the Language Server Protocol for presentation in an editor or IDE.

As doc chunks also is the input to ExDoc for generating nice html and epub documentation we want to make ExDoc support Erlang documentation as well (and possibly other languages).

For Elixir we already have the doc chunks, the use from the shell and the ExDoc parts in place and now we want to make the same available for Erlang.

Since Edoc currently is the only tool for documenting Erlang APIs which in practice is available and used for Erlang components outside OTP it would be good to add the possibility to generate doc chunks from Edoc. Note that Edoc is open for adding backends as plugins so it should be possible to do this without need to change the core parts of Edoc (for this reason at least). I also think Edoc should remain compatible about what input it takes. So we want an Edoc markup to doc chunk tool probably based on Edoc.

Since almost everything in OTP is documented via the OTP XML format (with tooling in the erl_docgen application) we also want a OTP XML to doc chunk translator. The docs_chunk tool by by @wojtekmach showed that this is a way forward. Now I have taken @wojtekmach s work as inspiration to do a similar translation but with a different implementation which I think should be part of the erl_docgen application in OTP.

The goal is to have a make target for building OTP which can produce the doc chunks for all OTP modules (with public APIs).

For this to happen we have discussed the format of an doc chunk. It is probably no point in generating Markdown since that format is tricky to parse and will loose information compared with what we have in the original source. Instead we think it would be good to have an Erlang term format along the lines of:

{Tag, Attributes, Content} Content = [binary()|{Tag, Attributes, Content}]

The same format should be used when generated from OTP XML and from Edoc.

ExDoc can be extended to support this type of format.

-type docs_v1() :: #docs_v1{anno :: erl_anno:anno(),
                            beam_language :: beam_language(),
                            format :: mime_type(),
                            module_doc :: doc(),
                            metadata :: metadata(),
                            docs :: [docs_v1_entry()]}.
%% The Docs v1 chunk according to EEP 48.

-type docs_v1_entry() :: #docs_v1_entry{kind_name_arity :: {atom(), atom(), arity()},
                                        anno :: erl_anno:anno(),
                                        signature :: signature(),
                                        doc :: doc(),
                                        metadata :: metadata()}.

It is the exact contents in the metadata and in the doc part for both module and function/type that is of most interest. Also in what representation we have the -type and -spec parts as they are carrying important information both for the documentation and when doing for example completion in tha shell or via the LSP.

It is this format that is important to settle first. I will soon present more details about this, that we can discuss.

erszcz commented 5 years ago

@wojtekmach

I vaguely remember reading somewhere (but don't quote me on this) that writing typespecs is preferred over writing edoc typespec tags.

Indeed, it's in the official EDoc documentation:

Note that although the syntax described in the following can still be used for specifying functions we recommend that Erlang specifications as described in Types and Function Specification should be added to the source code instead.

I left my previous comment here as a conclusion of the evening research and also an explanation to why the chunks can't be generated for EDoc itself yet.

I'd focus on that scenario and maybe even warn when edoc typespec is found.

The EDoc typespec, at least for now, is the only way to add a textual description to a type definition. In a fully fledged case we would have:

%% @type example_t(). An example type doc.
-type example_t() :: any().

I think a warning is appropriate when:

KennethL commented 5 years ago

Yes the recommended is to use -type and -spec since these are the ones you need to use for among other tools Dialyzer. Also note that Edoc can use these just as well as the @type, ... It would actually be good if Edoc could warn for the use of @type so that we encourage the use of -type/-spec.

/Kenneth

On Tue, Nov 19, 2019 at 10:21 AM Wojtek Mach notifications@github.com wrote:

I vaguely remember reading somewhere (but don't quote me on this) that writing typespecs is preferred over writing edoc typespec tags. If that's correct, I'd focus on that scenario and maybe even warn when edoc typespec is found. And if that's not correct, I'd be curious to learn about. that use case. I added this to my internal ExDoc notes.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/erlef/eef-documentation-wg/issues/4?email_source=notifications&email_token=AABFWSCDFB4YQGZTIJIRZY3QUOVX5A5CNFSM4JOSGEFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEENOGIQ#issuecomment-555410210, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABFWSDOAGULTTLC4UA3RHTQUOVX5ANCNFSM4JOSGEFA .

wojtekmach commented 5 years ago

@erszcz we can document types with edoc like this.

-type foo() :: atom().
%% Docs for foo.

-type bar() :: atom().
%% Docs for bar.

Note, we can't use @doc (or @since etc) here.

erszcz commented 5 years ago

@wojtekmach Indeed, I was not sure doc comments with spec/type attributes are already supported. Having checked that, I see it's described in EDoc docs, but not mentioned in Types and Function Specifications.

erszcz commented 5 years ago

@KennethL

It would actually be good if Edoc could warn for the use of @type so that we encourage the use of -type/-spec.

Good point, I'll add that in my fork.

erszcz commented 4 years ago

Here's an update on what https://github.com/erszcz/edoc/tree/extract-layouts-wip currently produces:

{docs_v1,0,erlang,<<"text/markdown">>,
         [{p,[<<"EDoc - the Erlang program documentation generator.">>]},
          <<"\n \n  ">>,
          <<"This module provides the main user interface to EDoc.\n  ">>,
          {ul,[<<"\n    ">>,
               {li,[{a,[{href,<<"overview-summary.html">>}],
                       [<<"EDoc User Manual">>]}]},
               <<"\n    ">>,
               {li,[{a,[{href,<<"overview-summary.html#Running_EDoc">>}],
                       [<<"Running EDoc">>]}]},
               <<"\n  ">>]}],
         #{},
         [{{type,edoc_module,0},
           0,
           [<<"edoc_module/0">>],
           [<<"  The EDoc documentation data for a module,\n  expressed as an XML document in ">>,
            {a,[{href,<<"http://www.erlang.org/edoc/doc/xmerl/doc/index.html">>},
                {target,<<"_top">>}],
               [<<"XMerL">>]},
            <<" format. See\n  the file ">>,
            {a,[{href,<<"edoc.dtd">>}],[{code,[<<"edoc.dtd">>]}]},
            <<" for details.">>],
           #{}},
          {{type,filename,0},0,[<<"filename/0">>],[],#{}},
          {{type,proplist,0},0,[<<"proplist/0">>],[],#{}},
          {{type,comment,0},0,[<<"comment/0">>],[],#{}},
          {{type,syntaxTree,0},0,[<<"syntaxTree/0">>],[],#{}},
          {{function,file,1},
           0,
           [<<"file/1">>],
           #{<<"en">> =>
                 <<"Equivalent to [file(Name, [])](`file/2`).">>},
           #{}},
          {{function,file,2},
           0,
           [<<"file/2">>],
           [{p,[<<"Reads a source code file and outputs formatted documentation to  \na corresponding file.">>]},
            <<"\n \n  ">>,<<"Options:\n  ">>,
            {dl,[<<"\n   ">>,
                 {dt,[{code,[<<"{dir, ">>,
                             {a,[{href,<<"#type-filename">>}],[<<"filename()">>]},
                             <<"}">>]},
                      <<"\n   ">>]},
                 <<"\n   ">>,
                 {dd,[<<"Specifies the output directory for the created file. (By\n       default, the output is written to the directory of the source\n       file.)\n   ">>]},
                 <<"\n   ">>,
                 {dt,[{code,[<<"{source_suffix, string()}">>]},<<"\n   ">>]},
                 <<"\n   ">>,
                 {dd,[<<"Specifies the expected suffix of the input file. The default\n       value is ">>,
                      {code,[<<"\".erl\"">>]},
                      <<".\n   ">>]},
                 <<"\n   ">>,
                 {dt,[{code,[<<"{file_suffix, string()}">>]},<<"\n   ">>]},
                 <<"\n   ">>,
                 {dd,[<<"Specifies the suffix for the created file. The default value is\n       ">>,
                      {code,[<<"\".html\"">>]},
                      <<".\n   ">>]},
                 <<"\n  ">>]},
            <<"\n \n  ">>,
            {p,[<<"See ">>,
                {a,[{href,<<"#get_doc-2">>}],[{code,[<<"get_doc/2">>]}]},
                <<" and ">>,
                {a,[{href,<<"#layout-2">>}],[{code,[<<"layout/2">>]}]},
                <<" for further  \noptions.">>]},
            <<"\n \n  ">>,
            <<"For running EDoc from a Makefile or similar, see\n  ">>,
            {a,[{href,<<"edoc_run.html#file-1">>}],
               [{code,[<<"edoc_run:file/1">>]}]},
            <<".\n ">>],
           #{}},
          {{function,files,1},0,[<<"files/1">>],[],#{}},
          {{function,files,2},
           0,
           [<<"files/2">>],
           #{<<"en">> =>
                 <<"Equivalent to [run([], Files, Options)](`run/3`).">>},
           #{}},
          {{function,application,1},
           0,
           [<<"application/1">>],
           #{<<"en">> =>
                 <<"Equivalent to [application(Application, [])](`application/2`).">>},
           #{}},
          {{function,application,2},
           0,
           [<<"application/2">>],
           [<<"Run EDoc on an application in its default app-directory. See\n  ">>,
            {a,[{href,<<"#application-3">>}],
               [{code,[<<"application/3">>]}]},
            <<" for details.">>],
           #{}},
         ...
     ]}.

While for ExDoc and web browsers the whitespace present in the descriptions might not make much difference, it may for the shell viewer. Depending on how simple or smart it is going to be some whitespace cleanup and possibly promotion of freestanding text to <p> elements might be necessary.

@garazdawi, what do you think about it? I think the text layout generated by docsh looks decent and I don't mind it being reused, but I'm obviously biased :)

garazdawi commented 4 years ago

Depending on how simple or smart it is going to be some whitespace cleanup and possibly promotion of freestanding text to

elements might be necessary.

For the OTP docs I've chosen to do the whitespace trimming before rendering the doc chunks as I would like to keep the renderer as simple as possible. However, if the renderer is supposed to work with multiple input sources I might as well make it work on the doc chunk html format and thus it can be used both by the parser and the renderer.

As for the format I've tried to mimic the man page format of the Erlang/OTP man pages. I think that it will have to do for now as it is not something that has to be backward compatible and we should be able to change it in the future.

garazdawi commented 4 years ago

Do you have any example module/library that this works on? I tried adding it to recon but that failed with this crash:

===> Uncaught error: {badmatch,false}
===> Stack trace to the error location:
[{edoc_chunks,xpath_to_chunk_format,3,
              [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/edoc_chunks.erl"},
               {line,170}]},
 {edoc_chunks,edoc_to_chunk,2,
              [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/edoc_chunks.erl"},
               {line,80}]},
 {rebar3_edoc_chunks,process_file,3,
                     [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/rebar3_edoc_chunks.erl"},
                      {line,84}]},
 {rebar3_edoc_chunks,'-process_app/2-lc$^0/1-0-',3,
                     [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/rebar3_edoc_chunks.erl"},
                      {line,77}]},
 {rebar3_edoc_chunks,process_app,2,
                     [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/rebar3_edoc_chunks.erl"},
                      {line,77}]},
 {rebar3_edoc_chunks,'-do/1-lc$^0/1-0-',2,
                     [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/rebar3_edoc_chunks.erl"},
                      {line,59}]},
 {rebar3_edoc_chunks,do,1,
                     [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/rebar3_edoc_chunks.erl"},
                      {line,59}]},
 {rebar_core,do,2,
             [{file,"/home/eluklar/git/rebar3/src/rebar_core.erl"},
              {line,154}]}]
erszcz commented 4 years ago

@garazdawi For now I've been testing on EDoc itself:

09:59:26 erszcz @ x5 : ~/work/erszcz/edoc (extract-layouts-wip)
$ r3 shell
===> Verifying dependencies...
===> Compiling edoc
Erlang/OTP 21 [erts-10.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V10.1  (abort with ^G)
1> edoc:files(["src/edoc.erl"], [{doclet, edoc_doclet_chunks}, {dir, "doctest"},
1>                   {layout, edoc_layout_chunk_htmltree}]).
ok
2> {ok, BChunk} = file:read_file("doctest/chunks/edoc.chunk").
{ok,<<131,104,7,100,0,7,100,111,99,115,95,118,49,97,0,
      100,0,6,101,114,108,97,110,103,109,0,0,...>>}
3> Chunk = binary_to_term(BChunk).
{docs_v1,0,erlang,<<"text/markdown">>,...}

It's definitely still a WIP though.

erszcz commented 4 years ago

I've cleaned up the new chunk layouts and adjusted the Rebar3 plugin a little bit. Here's how it looks like with Recon now (as of extract-layouts @ 13df5b1):

15:35:31 erszcz @ x5 : ~/work/erszcz/recon (master *)
$ cat rebar.config
{profiles, [
    {test, [
        {erl_opts, [nowarn_export_all, {d, 'TEST'}]}
    ]}
]}.

{plugins,
 [
  {rebar3_edoc_chunks, {git, "https://github.com/erszcz/edoc.git", {branch, "extract-layouts"}}}
 ]}.

{provider_hooks,
 [
  {post, [{compile, {edoc_chunks, compile}}]}
 ]}.
15:36:55 erszcz @ x5 : ~/work/erszcz/recon (master *)
$ r3 shell
===> Verifying dependencies...
===> Compiling recon
Erlang/OTP 21 [erts-10.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V10.1  (abort with ^G)
1> f(ReadChunk).
ok
2> ReadChunk = fun (File) ->
2>                     {ok, BChunk} = file:read_file(File),
2>                     Chunk = binary_to_term(BChunk)
2>             end.
#Fun<erl_eval.6.128620087>
(search)`rp': rp( ReadChunk("_build/default/lib/recon/doc/chunks/recon.chunk") ).
{docs_v1,0,erlang,<<"text/markdown">>,
         [{p,[<<"Recon, as a module, provides access to the high-level functionality   \ncontained in the Recon application.">>]},
          <<"\n  \n   ">>,
          {p,[<<"It has functions in five main categories:">>]},
          <<"\n  \n   ">>,
          {dl,[<<"\n       ">>,
               {dt,[<<"1. State information">>]},
               <<"\n       ">>,
               {dd,[<<"Process information is everything that has to do with the\n           general state of the node. Functions such as ">>,
                    {a,[{href,<<"#info-1">>}],[{code,[<<"info/1">>]}]},
                    <<"\n           and ">>,
                    {a,[{href,<<"#info-3">>}],[{code,[<<"info/3">>]}]},
                    <<" are wrappers to provide more details than\n           ">>,
                    {code,[<<"erlang:process_info/1">>]},
                    <<", while providing it in a production-safe\n           manner. They have equivalents to ">>,
                    {code,[<<"erlang:process_info/2">>]},
                    <<" in\n           the functions ">>,
                    {a,[{href,<<"#info-2">>}],[{code,[<<"info/2">>]}]},
                    <<" and ">>,
                    {a,[{href,<<"#info-4">>}],[{code,[<<"info/4">>]}]},
                    <<", respectively.">>]},
...
josevalim commented 4 years ago

That looks awesome. We will have to do something about the links though. {a,[{href,<<"#info-4">>}] won't work for ExDoc. We will need to formalize a way to write said annotations.

KennethL commented 4 years ago

Regarding links we have information (in the OTP xml and I think we can know it from Edoc as well) about when the link is of type:

We have today translated these links to {a,[{href,SomeBinary}],Content} but the plan is that we must define a notation in the chunk format where we can express the link types above. The need for this is because the SomeBinary in the href attribute is not ready to use for generating html (or other formats supporting links) since in what file to find the real target for the link depends on the format to produce. The real value of href must be computed by for example Exdoc but the knowledge about the type of link should be enough. l Of course there are also other "normal" links where the href attribute has the usual meaning in html and these can be as they are.

I was thinking of introducing extra attribute(s) to the a tag {a,[{linktype,<<"f">>},{href,<<"info,4">>}] where the href then can be translated to

info-4 by the html renderer (Exdoc) depending on how the target

destination is named.

I don't think we can expect the chunk format to be a direct correspondence to html in all aspects, the choice of html tags is done because they are recognized but for the links we can not expect a 100% correspondence. Loic mentioned the rel attribute but I don't think the meaning in html match what we need here.

As I understood it you have a notion of module, function etc. in Elixir. Comments?

/Kenneth

On Sun, Feb 23, 2020 at 7:05 PM José Valim notifications@github.com wrote:

That looks awesome. We will have to do something about the links though. {a,[{href,<<"#info-4">>}] won't work for ExDoc. We will need to formalize a way to write said annotations.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/erlef/eef-documentation-wg/issues/4?email_source=notifications&email_token=AABFWSHSKXXIIUGOHJW2BHTREK3GPA5CNFSM4JOSGEFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMWC4EA#issuecomment-590097936, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABFWSEHQTGPNEVIM4DS6M3REK3GPANCNFSM4JOSGEFA .

josevalim commented 4 years ago

I agree with everything. We don't need to be fully compatible with HTML and the formats don't need to be compatible between languages either.

My only suggestion is to not add a href to said links. That will make it easier for us to detect if something is an actual link or an internal reference. Another option is just to use a random tag, such as <erlang-ref> or <ref>.

In any case, tegarding the format, here is what Elixir uses:

erszcz commented 4 years ago

I've worked a bit on compat with OTP 23-devel shell_docs - https://github.com/erszcz/edoc/commits/extract-layouts now produces compatible chunks for all Recon modules apart from recon_trace.

@garazdawi One of the problems is that shell_docs throws unhandled for an h3 because of the guard on Pos:

render_element({h3,_,Content},State,Pos,_Ind,D) when Pos =< 2 ->

However, recon_trace does not seem to abuse the EDoc syntax:

%%% == Tracing Erlang Code ==

The other problem is the set of allowed tags - tt does not seem to be handled by shell_docs, while EDoc currently lets all tags from https://developer.mozilla.org/en-US/docs/Web/HTML/Element pass through, including the deprecated ones. I think I'll just make sure only tags defined with shell_docs ALL_ELEMENTS macro are output.

garazdawi commented 4 years ago

@garazdawi One of the problems is that shell_docs throws unhandled for an h3 because of the guard on Pos:

render_element({h3,_,Content},State,Pos,_Ind,D) when Pos =< 2 ->

However, recon_trace does not seem to abuse the EDoc syntax:

%%% == Tracing Erlang Code ==

I've sent a PR to your edoc changes and with it I can render recon_trace, but without it I cannot even get the generation to work.

The other problem is the set of allowed tags - tt does not seem to be handled by shell_docs, while EDoc currently lets all tags from https://developer.mozilla.org/en-US/docs/Web/HTML/Element pass through, including the deprecated ones. I think I'll just make sure only tags defined with shell_docs ALL_ELEMENTS macro are output.

We cannot support all of HTML in the renderer. So either we should ignore unknown tags when rendering, or ignore them when creating the chunks.

josevalim commented 4 years ago

We cannot support all of HTML in the renderer. So either we should ignore unknown tags when rendering, or ignore them when creating the chunks.

Yeah, we will have the same issue in Elixir's shell rendering. :(

However, we cannot ignore these tags when creating the chunk, because those tags are likely useful when generating actual HTML documentation.

We could ignore them in the renderer, but maybe the documentation will then appear incomplete. Maybe an alternative is to render them as HTML tags in the shell too? It would look weird in the shell but that's the best option given they were also written as HTML in the actual docs. I think that's what Elixir does for HTML in the markdown.

KennethL commented 4 years ago

I think the usage of html markup in the doc comments processed by edoc is mostly a workaround because of shortcomings of the markup. Therefore I think that html markup in the source (to be processed by Edoc) should be discouraged. Create a recommended and supported markup for the cases we think needs support and forbid general usage of html (or explain that it will be omitted when rendering in the shell).

/Kenneth

On Mon, Mar 9, 2020 at 12:02 PM José Valim notifications@github.com wrote:

We cannot support all of HTML in the renderer. So either we should ignore unknown tags when rendering, or ignore them when creating the chunks.

Yeah, we will have the same issue in Elixir's shell rendering. :(

However, we cannot ignore these tags when creating the chunk, because those tags are likely useful when generating actual HTML documentation.

We could ignore them in the renderer, but maybe the documentation will then appear incomplete. Maybe an alternative is to render them as HTML tags in the shell too? It would look weird in the shell but that's the best option given they were also written as HTML in the actual docs. I think that's what Elixir does for HTML in the markdown.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/erlef/eef-documentation-wg/issues/4?email_source=notifications&email_token=AABFWSGRAIIRQ3NFFFOJP5TRGTEDHA5CNFSM4JOSGEFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOGUNAA#issuecomment-596461184, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABFWSCFDC33OUBMV3MQKUDRGTEDHANCNFSM4JOSGEFA .

erszcz commented 4 years ago

@garazdawi

I've sent a PR to your edoc changes and with it I can render recon_trace, but without it I cannot even get the generation to work.

Thanks for the PR. Indeed, I experimentally added the shell_docs:normalize/1 call there to clean up non-meaningful whitespace, but apparently I did not do it correctly.

I wanted to check if artefacts like this leading space before "This module..." could be fixed by normalizing:

4> h(recon_rec).

        recon_rec

   This module handles formatting records for known record types. Record definitions are
  imported from modules by user. Definitions are distinguished by record name and its

But it doesn't seem to help :| The real problem is using the @doc tag in recon_rec comment:

%%% @doc
%% This module handles formatting maps.

The text ought to be on the same line, but it's on the next one.

BTW, I've cleaned up and merged the extract-layouts branch to master - it works fine (at least on Recon) in case anyone's curious to try.

@josevalim @KennethL

My approach so far was to let through only a certain set of tags (originally, the set from MDN, I later switched to the set Lukas uses in the renderer) and for all the other tags to just extract the text and disregard the tag name and attributes. For some tags (the aforementioned links/refs/<a href='...'>, but also tt) there could be some builtin translations. What do you think about this approach?

josevalim commented 4 years ago

@erszcz I don't think we should remove or rewrite the HTML when writing the chunk because when users compare the result from edoc_html with ex_doc in the future, ex_doc would seemingly discard information/markup and they probably won't be happy with that. :)

So if the plan is for edoc to also use "application/erlang+html", we need to change both Erlang/Elixir shell renderers to deal with markup they don't know upfront - even if it is just by discarding it or by rendering it as HTML.

I agree with @KennethL that ideally they wouldn't have to write any HTML in Edoc. But until then, we have to do what we have to do. :)

garazdawi commented 4 years ago

@garazdawi

I've sent a PR to your edoc changes and with it I can render recon_trace, but without it I cannot even get the generation to work.

Thanks for the PR. Indeed, I experimentally added the shell_docs:normalize/1 call there to clean up non-meaningful whitespace, but apparently I did not do it correctly.

I wanted to check if artefacts like this leading space before "This module..." could be fixed by normalizing:

4> h(recon_rec).

        recon_rec

   This module handles formatting records for known record types. Record definitions are
  imported from modules by user. Definitions are distinguished by record name and its

But it doesn't seem to help :| The real problem is using the @doc tag in recon_rec comment:

%%% @doc
%% This module handles formatting maps.

The text ought to be on the same line, but it's on the next one.

The normalizer should eliminate that space, I'll see if I can find out what is going on.

erszcz commented 4 years ago

So if the plan is for edoc to also use "application/erlang+html", we need to change both Erlang/Elixir shell renderers to deal with markup they don't know upfront - even if it is just by discarding it or by rendering it as HTML.

It seems to me, then, that for EDoc as is we have to accept HTML, yet discourage its use. I propose the aforementioned Mozilla Developer Network set of tags as the first approximation of accepted tags (but there are tags like canvas, which obviously do not make sense).

The renderers OTOH, might use the approach to only extract text and discard unknown tag metainfo. That way, it would be possible to display raw chunk content with code:get_doc() or shell_docs:get_doc() not to miss any metadata, while h(...) would display something nicely formatted, but possibly not 100% accurate.

erszcz commented 4 years ago

I've just updated the top post with the current status and todos.

marianoguerra commented 4 years ago

one minute after writing an email about edoc I see this thread, I would like to know what's the plan for this edoc fork in relation with OTP/edoc, and if incremental improvements to OTP/edoc make sense or this is the way forward and will be merged eventually to OTP?

josevalim commented 4 years ago

The goal is to make a contribution to OTP that converts EDoc to chunks.

erszcz commented 4 years ago

@marianoguerra I'm aiming to prepare a PR to OTP, either for OTP 23 rc2 or rc3 - as soon as the TODOs are taken care of. Kenneth and the OTP team are fine with these changes - for example, please see https://github.com/erlef/eef-documentation-wg/issues/3#issuecomment-576161427.

erszcz commented 4 years ago

I'm keeping the todo list in the top post more or less up to date, but here's a recap of the latest changes:

I've encountered some issues on the way, though:

Ufff, well, that's a wall of text, but it added up over the last few days.

josevalim commented 4 years ago

@wojtekmach, @josevalim when testing interop in order to avoid (bad signature) for types in ExDoc generated HTML

Good catch. I have posted a comment on the PR. If you want to submit a PR to ExDoc, it will be welcome!

wojtekmach commented 4 years ago

Hi @erszcz, I'll continue working on the Erlang support on https://github.com/elixir-lang/ex_doc/tree/wm-erlang which I've just rebased with latest master. If you'd like to send patches to that branch that would be appreciated.

garazdawi commented 4 years ago

@garazdawi what do you think? I can make this change part of the edoc PR if that's ok.

Yes, make it part of the PR. Thanks.

KennethL commented 4 years ago

I have a hard time understanding why the hidden and private properties in the Edoc markup should result in anything at all in the doc chunk

@private and @hidden information used to only be passed to EDoc doclets/layouts for the module, not for entries. I've changed that and now all entries, even not exported ones, are stored in the chunks. This means shell_docs prints proper warnings when accessing hidden | none fields, but it also means chunks are bigger in size. On the other hand, even with no human-readable docs, the chunk entries might still contain metadata useful for tools. I think the best would be to have exported and @private entries in the chunks (with proper metadata) and @hidden left out.

I don't think exported or private entries from the Edoc markup should result in anything at all in the chunks. Possibly it could be an option to Edoc to output private functions to the docchunk, but to just let all exported function to become entries i the docchunk is a bad idea in my opinion. Just because a function is exported does not mean that it is part of any public API.

/Kenneth

On Sun, Apr 5, 2020 at 1:07 PM Radek Szymczyszyn notifications@github.com wrote:

I'm keeping the todo list in the top post more or less up to date, but here's a recap of the latest changes:

  • bin/edoc.escript is now available - it's a CLI interface to edoc:application/2 and edoc:files/2
  • since I needed some material for dogfooding, all @spec and @type tags are now rewritten to -spec and -type attributes; types are also moved from header files to corresponding modules to utilise module names as namespaces
  • @deprecated and @since are now exported to chunks
  • EDoc @private and @hidden are translated respectively to EEP-48 hidden and none
  • the chunks are normalized with shell_docs:normalize/1
  • EDoc can now generate chunks for itself 🎉 All of them pass shell_docs:validate/1.

I've encountered some issues on the way, though:

-

EEP-48 states that ModuleDoc and Doc fields can be #{DocLanguage := DocValue} | none | hidden. shell_docs:validate/1 wouldn't have accepted non-map variants, so I had applied erszcz/otp@1e72aef https://github.com/erszcz/otp/commit/1e72aefc965d909b141a92bf8771e0d175ba29b1 to make it accept none | hidden. However, maybe its intention is to actually check if docs are present - @garazdawi https://github.com/garazdawi what do you think? I can make this change part of the edoc PR if that's ok. It's funny that validate would only crash on hidden as, apparently, maps:map(..., none) returns #{} - none is a valid map iterator :)

@deprecated accepts any EDoc comment, therefore nested HTML tags. EEP-48, however, expects this field to be a flat binary() - this requires discarding useful information. ExDoc uses this field for tooltips, where a flat binary is fine, but also for a disclaimer under entry definition, where the discarded tags would be useful (links, etc). Currently, EDoc discards any subtags tags and flattens @deprecated to a binary(), as it's the simplest working solution.

@private and @hidden information used to only be passed to EDoc doclets/layouts for the module, not for entries. I've changed that and now all entries, even not exported ones, are stored in the chunks. This means shell_docs prints proper warnings when accessing hidden | none fields, but it also means chunks are bigger in size. On the other hand, even with no human-readable docs, the chunk entries might still contain metadata useful for tools. I think the best would be to have exported and @private entries in the chunks (with proper metadata) and @hidden left out.

@wojtekmach https://github.com/wojtekmach, @josevalim https://github.com/josevalim when testing interop in order to avoid (bad signature) for types in ExDoc generated HTML, I've used erszcz/ex_doc@ 5981077 https://github.com/erszcz/ex_doc/commit/5981077d008ce55b583ebb24ce46d9840d9441ce

  • a fix on top of wm-erlang. As of EDoc erszcz/edoc@967d475 https://github.com/erszcz/edoc/commit/967d475914ea54218422f4acc0242ca4dce8e457 this branch doesn't work anymore. I'll try to figure out the exact reason, but I'm including the error below so that it don't escape my memory:

    $ /Users/erszcz/work/elixir-lang/ex_doc/ex_doc edoc "0.11" _build/default/lib/edoc/ebin --main edoc

    ** (MatchError) no match of right hand side value: nil

    (ex_doc 0.21.2) lib/ex_doc/formatter/html.ex:74: anonymous fn/2 in ExDoc.Formatter.HTML.autolink_and_render/4

    (elixir 1.10.2) lib/enum.ex:2111: Enum."-reduce/3-lists^foldl/2-0-"/3

    (ex_doc 0.21.2) lib/ex_doc/formatter/html.ex:73: anonymous fn/2 in ExDoc.Formatter.HTML.autolink_and_render/4

    (elixir 1.10.2) lib/enum.ex:2111: Enum."-reduce/3-lists^foldl/2-0-"/3

    (ex_doc 0.21.2) lib/ex_doc/formatter/html.ex:71: ExDoc.Formatter.HTML.autolink_and_render/4

    (ex_doc 0.21.2) lib/ex_doc/formatter/html.ex:21: ExDoc.Formatter.HTML.run/2

    (elixir 1.10.2) lib/kernel/cli.ex:124: anonymous fn/3 in Kernel.CLI.exec_fun/2

    wm-erlang-3, on the other hand, does not recognise types provided by OTP apps, yet crashes on something else:

    $ /Users/erszcz/work/elixir-lang/ex_doc/ex_doc edoc "0.11" _build/default/lib/edoc/ebin --main edoc

    warning: documentation references t::erl_syntax.syntaxTree/0 but it doesn't exist or isn't public (parsing t:edoc.syntaxTree/0 docs)

    warning: documentation references t::edoc.module/0 but it doesn't exist or isn't public (parsing edoc_extract.get_module_info/2 docs)

    warning: documentation references t::erl_syntax.syntaxTree/0 but it doesn't exist or isn't public (parsing edoc_extract.get_module_info/2 docs)

    warning: documentation references t::erl_syntax.forms/0 but it doesn't exist or isn't public (parsing edoc_extract.header/4 docs)

    warning: documentation references t::erl_syntax.forms/0 but it doesn't exist or isn't public (parsing edoc_extract.header/5 docs)

    warning: documentation references t::erl_syntax.syntaxTree/0 but it doesn't exist or isn't public (parsing edoc_extract.preprocess_forms/1 docs)

    warning: documentation references t::erl_syntax.forms/0 but it doesn't exist or isn't public (parsing edoc_extract.source/4 docs)

    warning: documentation references t::erl_syntax.forms/0 but it doesn't exist or isn't public (parsing edoc_extract.source/5 docs)

    warning: documentation references t::erl_syntax.syntaxTree/0 but it doesn't exist or isn't public (parsing t:edoc_specs.syntaxTree/0 docs)

    ** (TokenMissingError) nofile:1: missing terminator: end (for "do" starting at line 1)

    (elixir 1.10.2) lib/code.ex:645: Code.format_string!/2

    (ex_doc 0.21.3) lib/ex_doc/autolink.ex:331: ExDoc.Autolink.typespec/2

    (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2

    (ex_doc 0.21.3) lib/ex_doc/formatter/html.ex:91: anonymous fn/5 in ExDoc.Formatter.HTML.render_all/4

    (elixir 1.10.2) lib/enum.ex:2111: Enum."-reduce/3-lists^foldl/2-0-"/3

    (ex_doc 0.21.3) lib/ex_doc/formatter/html.ex:88: anonymous fn/4 in ExDoc.Formatter.HTML.render_all/4

    (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2

Ufff, well, that's a wall of text, but it added up over the last few days.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/erlef/documentation-wg/issues/4#issuecomment-609398880, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABFWSE7MDEROM423P62ZALRLBQ7DANCNFSM4JOSGEFA .

josevalim commented 4 years ago

I believe the reason we have both none and hidden is because there is a distinction between:

  1. is this meant to be public but it was not documented
  2. or this was not meant to be public at all

The issue is that it depends on what you consider to be the default. A function is made publicly available only if it is documented? Or are functions publicly available unless you say they are private/hidden?

They both have pros and cons. Assuming that functions are public by default means that, even if they don't write proper docs, the generated documentation will have something. On the other hand, this means you can accidentally make public a function that was meant to be private.

If Edoc says that everything is private by default, then i agree they are probably not necessary in the chunk. There may be places in Elixir that won't handle this assumption well but this is an Elixir problem to fix. :)

garazdawi commented 4 years ago

functions [are] publicly available unless you say they are private/hidden?

This is the way that I have done it with the Erlang/OTP docs. All modules have a .chunk file, even if they are not public and in that file the module_doc is hidden and the docs list is empty.

https://github.com/erlang/otp/blob/master/lib/erl_docgen/src/docgen_xml_to_chunk.erl#L35-L36

If code:get_doc/1 tries to lookup a module that does not have an EEP-48 chunk then it will generate one on the fly from the AST.

erszcz commented 4 years ago

Status update

TL;DR:

TODO:


Here comes the long version. When we call edoc:files/2 or one of the other entry points, the EDoc app does the following:

  1. A doclet is invoked - the doclet understands the structure of Erlang projects / OTP apps and also encodes the final structure of the to-be-generated docs. It finds the source files and calls edoc:get_doc/3 on each of them.
  2. edoc:get_doc/3 calls edoc_extract:source/3 which parses the source file to AST forms and then converts the forms to a flat list of EDoc #entry records. These entries correspond to source level entities which are further processed later: comments, functions, specs, types, records. Callbacks are dropped from further processing at this stage. Another pass over the entries is done to unify @spec/@type and -spec/-type representation for later stages. The entries are passed to edoc_data:module/4.
  3. edoc_data:module/4 processes the list of entries and builds an expanded XMERL representation of a module documentation. Basic callback information (M:behaviour_info/1) is also appended directly to this representation. Some information from the #entry{} records is dropped at this stage, while some is translated into a somewhat bloated XMERL representation. There's an attempt to document this representation with DTD specs inlined in comments in this file. This returns to edoc:get_doc/3.
  4. edoc:get_doc/3 returns to the doclet. The doclet runs the layout plugin with the input being the expanded XMERL format.
  5. In case of output to chunks, edoc_layout_chunks extracts the relevant information from the XMERL and outputs as a #docs_v1{} record.
  6. The doclet converts the record to a binary and writes it to doc/chunks/.

Issues:

  1. No full callback info in the EDoc #entry{} records.
  2. No convenient way to pass complete specs or callback definitions through the XMERL layer. There's existing code to convert from #entry{} -> XMERL, but it doesn't make sense to do it just to convert back in the layout.
  3. I'm not sure if the XMERL representation is of any other use than to interface with the layouts. I haven't found any other code using it. @garazdawi? @KennethL?
erszcz commented 4 years ago

The latest changes are available at https://github.com/erszcz/edoc/tree/wip

garazdawi commented 4 years ago

I'm not sure if the XMERL representation is of any other use than to interface with the layouts. I haven't found any other code using it.

I don't know either. Maybe @richcarl knows something?

erszcz commented 4 years ago

Today's update.

Done:

The latest commit as of writing this is https://github.com/erszcz/edoc/tree/20a80c37a56cfac6c7709fb69255ac1b0f9c4c3f.

Next steps:

Preview:

20> h(edoc_layout_chunks, module).

  -spec module(edoc:xmerl_module(), proplists:proplist()) -> binary().

  Convert EDoc module documentation to an EEP-48 style doc chunk.
ok
21> ht(edoc_doclet).
   edoc_doclet

These types are documented in this module:

  -type doclet_toc() ::
            #doclet_toc{paths :: [string()], indir :: string()}.

  -type doclet_gen() ::
            #doclet_gen{sources :: [string()],
                        app :: no_app() | atom(),
                        modules :: [module()]}.

  -type no_app() :: [].

  -type context() ::
            #doclet_context{dir :: string(),
                            env :: edoc:env(),
                            opts :: [term()]}.

  -type command() :: doclet_gen() | doclet_toc().
ok
22> hcb(edoc_doclet).
   edoc_doclet

These callbacks are documented in this module:

  run/2
ok
23>
erszcz commented 4 years ago

Preview of yesterday's result (https://github.com/erszcz/edoc/tree/callbacks):

1> hcb(edoc_layout).
   edoc_layout

These callbacks are documented in this module:

  -callback module(edoc:xmerl_module(), _) -> binary().
ok
2> hcb(edoc_layout, module).

  -callback module(edoc:xmerl_module(), _) -> binary().

  Layout entrypoint.
ok

Callback signatures and comments are now stored in doc chunks. The syntax of callback comments is the same as of type specs, i.e. the comment follows the attribute:

-type command() :: doclet_gen()
         | doclet_toc().
%% All doclet commands.

-type doclet_toc() :: #doclet_toc{paths :: [string()],
                  indir :: string()}.
%% Doclet command.

-callback run(command(), context()) -> ok.
%% Doclet entrypoint.

It's a bit less flexible than function/spec comments where the order of forms doesn't matter:

%% @doc Before spec.
-spec f() -> ok.
f() -> ok.

-spec g() -> ok.
%% @doc After spec.
g() -> ok.

This stems from the fact that both spec attributes and function comments are attached to function definitions (and processed together with them), while callback/type comments are attached to the actual attributes and it's necessary to define which comments an attribute "owns" - the preceding or the following ones.

richcarl commented 4 years ago

I'm not sure if the XMERL representation is of any other use than to interface with the layouts. I haven't found any other code using it.

I don't know either. Maybe @richcarl knows something?

When I wrote it, being able to export as XML and apply XSLT seemed to be a good thing (so you didn't have to do the layout in plain Erlang if you didn't want to), but either nobody understood that it could be done, or nobody wanted to use XSLT. :-)

richcarl commented 4 years ago

It also allowed there to be an actual DTD describing the intermediate format, which is another good thing.

josevalim commented 4 years ago

Since we are standardizing on chunks, I wonder if it makes sense to remove those parts from edoc then, if they are really not being used (unless they are used internally?).

erszcz commented 4 years ago

@richcarl Thanks for your input!

@josevalim They are used by the default / original doclet and layout, and @KennethL underlined that for the time being we should leave it functional, even if extending EDoc with chunk support.

I'll leave the current implementation, i.e. the chunk output converting some information directly from the internal EDoc #entry{} records, until the PR to OTP is made and wait for comments then.

The last remaining thing before I intend to make the PR are links - this is the TODO in focus now.

KennethL commented 4 years ago

Since I would like edoc (with the support for chunks) to be part of OTP I want to be very careful with removing functionality right now. The support for chunks should just be another backend accepting the same input as before. It would then be easy and attractive for current users to just generate chunks and html via Exdoc instead.

I think the intermediate format is used in OTP and might be used elsewhere as well.

We can remove stuff later when the new functionality is established.

/Kenneth

On Mon, May 25, 2020 at 2:54 PM José Valim notifications@github.com wrote:

Since we are standardizing on chunks, I wonder if it makes sense to remove those parts from edoc then, if they are really not being used (unless they are used internally?).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/erlef/documentation-wg/issues/4#issuecomment-633558321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABFWSFHF3Y7HYV2M2RBSYDRTJTAFANCNFSM4JOSGEFA .

josevalim commented 4 years ago

Makes total sense, especially if it is used today. No need for removals. Thanks @erszcz and @KennethL!