Closed domluna closed 4 years ago
Don't we inherently throw away some information when going from source to AST? So you'd need AST nodes that retain much more information, to be able to re-create the details of the original formatting?
Unlike Markdown.plain
the intension for CommonMark.markdown
is for it to be reasonably roundtripable since it's used in the notebook
output for markdown cells and so does need to handle escapes correctly where it can. Lack of backslash escaping is a bug here.
As to source-identical output in markdown
, that's not a goal. It'll always just aim for a canonical form of things, such as always using atx headings whether the source wrote atx or setex headings. There is some sense of source position in the .sourcepos
field of Node
, but that's more used to handle certain parsing rather than for a CST-style tree.
With regards to conditional formatting of code blocks: I'm pretty sure that could be done using a small extension that intercepts the right CodeBlock
nodes after parsing. You won't be able to avoid parsing the rest of the syntax though that would likely interfere with indented code blocks in weird ways and not be roundtripable. If you do want JuliaFormatter
to be more aggressive in what it formats and also handle the markdown rather than just the code around it then this would be an option.
With regards to conditional formatting of code blocks: I'm pretty sure that could be done using a small extension that intercepts the right CodeBlock nodes after parsing. You won't be able to avoid parsing the rest of the syntax though that would likely interfere with indented code blocks in weird ways and not be roundtripable. If you do want JuliaFormatter to be more aggressive in what it formats and also handle the markdown rather than just the code around it then this would be an option.
This could be a solid option. I know extra trailing newlines are removed from the markdown (at least at the end of the file). Are there more invasive changes that occur?
Are there more invasive changes that occur?
If there are any significant formatting changes that aren't liked then we can just adjust markdown
to give nicer results. I'm not set yet on a particular style of how the markdown should be formatted. Simpler is better though. I'm pretty sure there's currently a lot of trailing whitespace that gets left behind when writing to markdown
, and possibly translation of HTML entities needs to be done
This could be a solid option.
A very simple POC that can use any package that has a String -> String
formatting method:
julia> using CommonMark, DocumentFormat, JuliaFormatter
julia> struct FmtRule
λ::Function
end;
julia> CommonMark.block_modifier(rule::FmtRule) = CommonMark.Rule(1) do parser, block
if block.t isa CommonMark.CodeBlock && block.t.info == "julia"
block.literal = rule.λ(block.literal)
end
end;
julia> p_1 = enable!(Parser(), FmtRule(JuliaFormatter.format_text));
julia> p_2 = enable!(Parser(), FmtRule(DocumentFormat.format));
julia> text =
"""
```julia
struct Foo{A, B}
a::A
b::B
end
```
```
not formatted
```
""";
julia> markdown(stdout, p_1(text))
```julia
struct Foo{A,B}
a::A
b::B
end
```
```
not formatted
```
julia> markdown(stdout, p_2(text))
```julia
struct Foo{A,B}
a::A
b::B
end
```
```
not formatted
```
Those internal CommonMark
methods and structs aren't settled yet, but the plan is to have a public API for 3rd-party extensions like these by the time 1.0
is released.
@MichaelHatherly I'm a bit confused by the code above. Can you explain a bit what the different parts do? Or are there docs somewhere?
Or are there docs somewhere?
Not much in the way of internal docs at the moment.
julia> struct FmtRule
λ::Function
end;
defines a new "rule" that can then be enabled!
on a particular Parser
instance with
p_1 = enable!(Parser(), FmtRule(JuliaFormatter.format_text));
It stores a reference to a formatting function that will be used to format code blocks.
julia> CommonMark.block_modifier(rule::FmtRule) = CommonMark.Rule(1) do parser, block
if block.t isa CommonMark.CodeBlock && block.t.info == "julia"
block.literal = rule.λ(block.literal)
end
end;
defines an "action" associated with FmtRule
that modifies block-level elements in a parsed AST with a priority 1
. Actions are run in order of priority (low to high). This particular "action" formats the .literal
content of CodeBlock
s when their .info
is julia
.
That's about it with this particular one. There's a number of others found in src/extensions/
that are a reasonably gentle introduction to how the package works. block_modifier
(used above), block_rule
, inline_modifier
, and inline_rule
are the building blocks upon which all the parsing machinery works.
Hope that's helpful, let me know if you need any other clarification.
Could there be an extension that doesn't alter the input at all? This is required to properly format docstrings for https://github.com/domluna/JuliaFormatter.jl/pull/231. The use case is using the markdown package to find Julia code in the docstrings, format it, and then print the output. Aside from what is formatted nothing else should be altered.
Here's an example
Currently using
Markdown
orCommonMark
withRawContentRule
the output is not the same as the input.