dotnet / interactive

.NET Interactive combines the power of .NET with many other languages to create notebooks, REPLs, and embedded coding experiences. Share code, explore data, write, and learn across your apps in ways you couldn't before.
MIT License
2.9k stars 389 forks source link

Evaluation leads to a default formatter registration, then user-defined formatters are ignored #692

Closed dsyme closed 4 years ago

dsyme commented 4 years ago

Evaluation implicitly leads to a default formatter registration which then leads to user-defined formatters being ignored. THis is very confusing.

  1. Restart kernel

  2. Evaluate DayOfWeek.Monday

  3. Register a formatter for IComparable

  4. Evaluate DayOfWeek.Monday again

Expected: The formatter for IComparable is invoked

Actual: the same formatting as step 1 is used

Compare with

  1. Restart kernel

  2. Register a formatter for IComparable

  3. Evaluate DayOfWeek.Monday again

In this case, the formatter for IComparable is invoked. The fact that I've previously formatted a DayOfWeek value should be irrelevant.

The formatter architecture is very sniffy, it seems full of hidden state like this. Somehow evaluation causes default formatters to be populated and then those take precedence of user-specified formatters.

There should be a simple declarative spec like

I took a look at the implementation and TBH I saw some code fragments that indicate these type-indexed tables get automatically populated by defaults which I assume that leads to this kind of problem.

dsyme commented 4 years ago

I'm working to clarify

  1. what the intended specification is for which formatter gets called when (given a set of user-specified formatters)

  2. what the default formatters are

  3. what the implementation bugs are with respect to the intended specification

As context, what I'm trying to do is set up the default formatting of F# values to be acceptable for example F# programming.

My overall feeling about the implementation is that there seems to be a lot of state (created formatters in _genericFormatters and perhaps elsewhere) that depends on the set of user-specified formatters and the defaults and can be recreated from these - that is the table of formatters represents a "on-demand compiled form" derived from the user specifications

I question whether these stateful tables are necessary given they probably need to be flushed when the set of user-specified formatters or defaults like "include internal properties on objects" change (like in the examples above).

To be helpful, I'm going to try to write a min-spec of what I think the intended behaviour is (unless there is one already available?)

dsyme commented 4 years ago

Another bug: For formatting a simple value that has no formattable members (e.g. take type X = A | B then format A) then we end up calling

                return new HtmlFormatter<T>((value, writer) => writer.Write(value));

However this is just using ToString() on the vaue which is a string not HTML. For example:

type X = 
    | A 
    | B 
    override x.ToString() = "<b>Bold</b>"

gives a situation where plaintext ToString() results get used as HTML:

image

I'll keep working through this and develop an intended specification, a list of issues and a set of fixes and tests

dsyme commented 4 years ago

Mini-spec

Formatting in .NET Interactive

The key user settings are:

When display(object, ?mimeType) is called, the the preferred mime type based on user settings and defaults. Strings default to plain text, most other things to HTML.

Next, format the object at type "ACTUAL" (obj.GetType())

Plain text formatting

For text/plain, the formatter for a type is chosen by:

  1. Try to get a user-registered (Formatter.Register) formatter

    • If T "can't be instantiated" then the formatter applies for all subtypes (RegisterLazilyForConcreteTypesOf).

    • If T "can be instantiated" then the formatter only applies for exact type matches

      COMMENT: this is a very strange part of the spec - this means registering a printer for an abstract base class, interface or generic non-abstract base class has different consequences to a non-generic non-abstract base class. This also explains why registering an "obj" printer doesn't do anything (it only selects type obj)

    • Some types have pre-registered formatters, notably Newtonsoft.Json.Linq.JArray, Newtonsoft.Json.Linq.JObject

    • If there is no user-registered formatter then look for a default formatter. Default registered plaintext formatters are

      • ExpandoObject
      • PocketView (writes encoded HTML)
      • KeyValuePair<string, object>
      • ReadOnlyMemory
      • TimeSpan
      • System.Type
      • System.RuntimeType (typeof(Type).GetType())
      • DateTime
      • DateTimeOffset
    • If that fails, look for special formatters for ReadOnlyMemory and TextSpan

  2. Special case for TypeIsAnonymous, TypeIsException,TypeIsValueTuple, "things not subtypes ofIEnumerable" for which the plaintext printerCreateForAllMembers` is used. This does the following (which are not necessarily appropriate for F# values)

    • it uses SingleLinePlainTextFormatter

    • For IsScalar types (Boolean, Byte, SByte, Int16, UInt16, Int32, UInt32, Int64, UInt64, IntPtr, UIntPtr, Char, Double, Single, decimal, Guid, string, DateTime, DateTimeOffset, TimeSpan and Nullable<_> versions of these), write using "Write"

    • Special formatting for Tuple and ValueTuple

    • Special formatting for Exceptions to "filter out internal values from the Data dictionary" and make the stack trace nicer

    • Special formatting for Enum values

    • Special formatting for subtypes of IEnumerable

    • Otheriwse show selected public proeprties in a display using <TYPE> PROP=<VALUE>, PROP=<VALUE>

      • Selected members are instance fields and properties that do not have DebuggerBrowsableAttribute with DebuggerBrowsableState.Never

      • For anonymous objects, suppress the display of the runtime type

      • Note that nested types use the awkward name for , so

      type C() = member x.P = 3
      C()

      gives

      { FSI_0009+C: P: 3 }

HTML formatting

TBD

dsyme commented 4 years ago

Notes to self:

Debugging

To debug, attach to dotnet-interactive.exe and set a breakpoint at Microsoft.DotNet.Interactive.Kernel.display and call it:

open Microsoft.DotNet.Interactive.Kernel

display(value, "text/plain")

code notes

        public static Task<DisplayedValue> DisplayAsync(
            this KernelInvocationContext context,
            object value,
            string mimeType = null)
        {
            DisplayedValue result = Display(context, value, mimeType);
            return Task.FromResult(result);
        }
jonsequitur commented 4 years ago

COMMENT: this is a very strange part of the spec - this means registering a printer for an abstract base class, interface or generic non-abstract base class has different consequences to a non-generic non-abstract base class. This also explains why registering an "obj" printer doesn't do anything (it only selects type obj)

This point of differentiation is fairly recent, based on user requests. Originally, the formatters were only useful for types that can be instantiated. These should ideally be differentiated more clearly to show the intention behind "registering" a formatter for interfaces, abstract types, and open generics, i.e. that it's an on-demand fallback to be used when no formatter has yet been registered. The aggressively specific name RegisterLazilyForConcreteTypesOf is the the start of a more explicit differentiation of these two use cases.

jonsequitur commented 4 years ago

gives a situation where plaintext ToString() results get used as HTML:

This is a result of the fact that when nesting formatters, we currently fall back to text/plain but preferably we would fall back to some form of HTML fragment. The text/html formatters, which follow Jupyter's Python conventions for laying out tables, aren't appropriate for nesting.

dsyme commented 4 years ago

@jonsequitur Cool thanks for the details.

Perhaps we can talk about this 1:1 on Monday, but I think I'd recommend this simple spec for choosing relevant formatters (or at least put it forward as a strawman).


Strawman spec for choosing a formatter (for mimeType and object actual type A):

  1. If no mimeType is specified, determine one

    • Choose the most-specific user-registered mime type preference relevant to A

    • If none are relevant, then choose a default mime type.

  2. Next, determine a formatter

    • Choose the most-specific user-registered formatter relevant to A

    • If none are relevant, then choose a default formatter (using the same rules with the built-in defaults).

Here "most specific" is in terms of the class and interface hierarchy. In the event of an exact tie in ordering or some other conflict, more recently registered formatters and mimeTypes are preferred. Type-instantiations of generic types are preferred to generic formatters when their GenericTypeDefinition are the same

The default sets of formatters for a mime type always include a formatter for object.


Examples: