Add highlevel documentation of internal structure of Fantomas

danyx23 commented 6 years ago

It would be great if one of the people who have already contributed a few fixes to Fantomas could add an overview document of how Fantomas works internally so that it would be faster for others to get up to speed on the codebase and be able to contribute fixes. I would like to start fixing bugs but it is hard to understand the dense code of Fantomas (I don't mean this as a criticism, just an observation as an outsider).

Initially it would just be helpful to have a brief high level outline of the steps and how the data structures interact (a lot seems to come form the F# compiler service but comments and maybe white space information seems to be processed independently?).

A next very helpful step would be practical guidance on how to debug a concrete issue - writing a failing unit test is of course a great next step, but what are debug helpers that exist in the code base? It seems that FakeHelpers.fs is a good place for this but used rather ad hoc?

Finally a really great help would be to document an already completed and merged bug fix process by an experienced maintainer in prose as part of the docs.

I would gladly volunteer to write such documentation but my knowledge of the code base is too low atm. If anyone feels like giving me pointers I would take on the job of writing such docs.

nojaf commented 6 years ago

Hi, great suggestion! Maybe the format function can be a good starting point.

let format config ({ Source = sourceCode; FileName =  filePath } as formatContext) =
    async {
        let! ast = parse formatContext
        return formatWith ast (Path.GetFileNameWithoutExtension filePath) (Some sourceCode) config
    }

Two things happen the input (source F# code) is being parsed using the FSharp Compiler Services to AST.

The config contains info regarding settings the user can provide.

Then the next step is to format the AST back to source code, it starts in formatWith.

let formatWith ast moduleName input config =
    // Use '\n' as the new line delimiter consistently
    // It would be easier for F# parser
    let sourceCode = defaultArg input String.Empty
    let normalizedSourceCode = String.normalizeNewLine sourceCode
    let formattedSourceCode =
        Context.create config normalizedSourceCode 
        |> genParsedInput { ASTContext.Default with TopLevelModuleName = moduleName } ast
        |> dump
        |> if config.StrictMode then id else integrateComments normalizedSourceCode

    // Sometimes F# parser gives a partial AST for incorrect input
    if input.IsSome && String.IsNullOrWhiteSpace normalizedSourceCode <> String.IsNullOrWhiteSpace formattedSourceCode then
        raise <| FormatException "Incomplete code fragment which is most likely due to parsing errors or the use of F# constructs newer than supported."
    else formattedSourceCode

Context.create config normalizedSourceCode will create a Context which contains a property Writer. That guy will eventually contains the formatted source code.

genParsedInput will then map AST fragments back to source code. However not directly, (at this point somebody else will have a better explanation). It uses some sort of functional composition where you pass around the Context and things get composed. Mostly it is somewhat clear what is happening.

For example genExpr. It pattern matches on stuff that is in the FCS AST module.

Like:

    | Lambda(e, sps) -> 
        !- "fun " +> col sepSpace sps (genSimplePats astContext) +> sepArrow +> preserveBreakNln astContext e

You sorta see where this is going, we have the fun keyword, combine it with +> and so on.

After genParsedInput is completed dump is called which will call the inner writer to write the output as string.

Lastly fantomas tries to restore comments, this is a bit tricky because there is not enough info inside the AST to reconstruct the exact comment. Things like indentation get lost. That is why fantomas stored the Comments inside the Context type as well. I'm not entirely sure how this part works, something with tokens and stuff.

So not the best explanation, but hey it's something.

danyx23 commented 6 years ago

Excellent start, thank you! I'll see if I can dig in a bit this weekend and maybe start drafting a document from your description that can then be filled in by contributors who understand individual parts better. Are you ok with that @nojaf?

nojaf commented 6 years ago

Sure, maybe @AnthonyLloyd can tell something about the Dbg module.

AnthonyLloyd commented 6 years ago

Hi, The Dbg module is the start of an improved version of printfn style debugging. Step through debugging can be hard in F# for code such as fantomas. By adding Dbg statements anywhere in the code and rerunning a test you can analyse what is happening. You can actually write some very useful Dbg functions to debug between pipes, count calls, run Dbg code in other parts of the codebase etc. Dbg statements will cause a compiler error in Release so that ensures none is left behind after debugging the code.

Example use:

Dbg.write "Here is e: %A" e

Dbg.iff (fun () -> s="|>") (fun () -> Dbg.write "es for s=|>: %A" es)

Dbg.fun2 (Dbg.write "a fun: %A -> %A -> %A") genExpr astContext e

|> List.collect genTree |> Dbg.tee (Dbg.write "in pipe: %A")
|> List.rev

|> Seq.groupBy (fun at -> at.Range.StartLine) |> Dbg.teeSeq (Dbg.seq (Dbg.write "%A")) // seqs get numbered by Dbg.write
|> Seq.map snd

Dbg.iff (fun () -> Dbg.count "loader" = 10) (Dbg.write "loader called 10 times")

Dbg.addFun "cache status" (fun () ->
    Dbg.write "rows count %i" rows.Count
    Dbg.write "columns count %i" columns.Count
)
// miles away
Dbg.runFun "cache status"

I'd like to add some specific fantomas Dbg functions e.g. pretty print readable AST, but at this point I can't see how you can do that without running the fantomas code out of process somehow.

Just created a live server memory investigation tool at work along these lines.

jindraivanek commented 6 years ago

genParsedInput and friends use (Context -> Context) function as value. Its all about function composition here. +> is basically just >>, only specialized for (Context -> Context) functions. You can look at definitions of all operators here.

Most times you don't need to understand details of this, just use +> as pipe, ++, +-, -- to write out text (or !+, !- if it is first thing in chain). But sometimes it is needed to use Context and we must write function by hand like this:

and genEnumCase astContext (EnumCase(ats, px, s, c)) =
    genPreXmlDoc px 
    +> ifElse astContext.HasVerticalBar sepBar sepNone 
    +> genOnelinerAttributes astContext ats 
    +> (fun ctx -> (if ctx.Config.StrictMode then !- s -- " = " else !- "") ctx) +> genConst c

Commonly used helper functions:

ifElse - shortcut for if .. then .. else ...
opt - for working with option - do something with value or do nothing.
col - working with collections - use function to each item and use separating function between items.

nojaf commented 5 years ago

@danyx23 do you still want to pursue this?

danyx23 commented 5 years ago

@nojaf Yes, but had a lot of other stuff going on lately. If you are bothered by the open issue please close it, I have the material I need for a first draft and will send it when I get around to it.

nojaf commented 5 years ago

Hi @danyx23 , thanks for your reply. Great to hear you still want to do this. I'll keep the issue open until the first draft.

Maybe another useful thing to mention is that the ast-viewer is super helpful to understand what is happening.

nojaf commented 4 years ago

Hey @danyx23 , the first videos cover pretty much the main parts of Fantomas. Would this suffice to close the issue?

danyx23 commented 4 years ago

@nojaf yes these are very good and much better than text because they show the workflow with the AST viewer etc. Great work, thanks a lot! I think I'll take this as a trigger to use more again and try to tackle some issues myself :)

fsprojects / fantomas

Add highlevel documentation of internal structure of Fantomas #255