Closed dsyme closed 4 years ago
I'm working to clarify
what the intended specification is for which formatter gets called when (given a set of user-specified formatters)
what the default formatters are
what the implementation bugs are with respect to the intended specification
As context, what I'm trying to do is set up the default formatting of F# values to be acceptable for example F# programming.
My overall feeling about the implementation is that there seems to be a lot of state (created formatters in _genericFormatters
and perhaps elsewhere) that depends on the set of user-specified formatters and the defaults and can be recreated from these - that is the table of formatters represents a "on-demand compiled form" derived from the user specifications
I question whether these stateful tables are necessary given they probably need to be flushed when the set of user-specified formatters or defaults like "include internal properties on objects" change (like in the examples above).
To be helpful, I'm going to try to write a min-spec of what I think the intended behaviour is (unless there is one already available?)
Another bug: For formatting a simple value that has no formattable members (e.g. take type X = A | B
then format A
) then we end up calling
return new HtmlFormatter<T>((value, writer) => writer.Write(value));
However this is just using ToString()
on the vaue which is a string not HTML. For example:
type X =
| A
| B
override x.ToString() = "<b>Bold</b>"
gives a situation where plaintext ToString()
results get used as HTML:
I'll keep working through this and develop an intended specification, a list of issues and a set of fixes and tests
Mini-spec
The key user settings are:
Formatter.SetPreferredMimeTypeFor(type, mimeType)
Formatter.Register
(formatter, mimeType) and Formatter<T>.Register(formatter, mimeType)
When display(object, ?mimeType)
is called, the the preferred mime type based on user settings and defaults. Strings default to plain text, most other things to HTML.
Next, format the object at type "ACTUAL" (obj.GetType()
)
if a string, dump the string regardless of mime type (so strings at HTML are interpreted as HTML)
check for nulls --> Formatter.NullString
For text/plain, the formatter for a type is chosen by:
Try to get a user-registered (Formatter.Register
If T "can't be instantiated" then the formatter applies for all subtypes (RegisterLazilyForConcreteTypesOf
).
If T "can be instantiated" then the formatter only applies for exact type matches
COMMENT: this is a very strange part of the spec - this means registering a printer for an abstract base class, interface or generic non-abstract base class has different consequences to a non-generic non-abstract base class. This also explains why registering an "obj" printer doesn't do anything (it only selects type obj
)
Some types have pre-registered formatters, notably Newtonsoft.Json.Linq.JArray
, Newtonsoft.Json.Linq.JObject
If there is no user-registered formatter then look for a default formatter. Default registered plaintext formatters are
typeof(Type).GetType()
)If that fails, look for special formatters for ReadOnlyMemory and TextSpan
Special case for TypeIsAnonymous
, TypeIsException,
TypeIsValueTuple, "things not subtypes of
IEnumerable" for which the plaintext printer
CreateForAllMembers` is used. This does the following (which are not necessarily appropriate for F# values)
it uses SingleLinePlainTextFormatter
For IsScalar
types (Boolean
, Byte
, SByte
, Int16
, UInt16
, Int32
, UInt32
, Int64
, UInt64
, IntPtr
, UIntPtr
, Char
, Double
, Single
, decimal
, Guid
, string
, DateTime
, DateTimeOffset
, TimeSpan
and Nullable<_>
versions of these), write using "Write"
Special formatting for Tuple and ValueTuple
Special formatting for Exceptions to "filter out internal values from the Data dictionary" and make the stack trace nicer
Special formatting for Enum values
Special formatting for subtypes of IEnumerable
Otheriwse show selected public proeprties in a display using <TYPE> PROP=<VALUE>, PROP=<VALUE>
Selected members are instance fields and properties that do not have DebuggerBrowsableAttribute with DebuggerBrowsableState.Never
For anonymous objects, suppress the display of the runtime type
Note that nested types use the awkward name for
type C() = member x.P = 3
C()
gives
{ FSI_0009+C: P: 3 }
TBD
Notes to self:
To debug, attach to dotnet-interactive.exe
and set a breakpoint at Microsoft.DotNet.Interactive.Kernel.display
and call it:
open Microsoft.DotNet.Interactive.Kernel
display(value, "text/plain")
public static Task<DisplayedValue> DisplayAsync(
this KernelInvocationContext context,
object value,
string mimeType = null)
{
DisplayedValue result = Display(context, value, mimeType);
return Task.FromResult(result);
}
PlainTextFormatter<_>.CreateForMembers
is never called except in tests
Down the OO rabbit hole: FormatterSetBase.AddFormatterFactory
???
In the following the TypeIsAnonymous
, TypeIsException
, TypeIsValueTuple
look redundant as none of these categories of types are even IEnumerable, so the fourth condition always holds
if (Formatter<T>.TypeIsAnonymous ||
Formatter<T>.TypeIsException ||
Formatter<T>.TypeIsValueTuple||
!typeof(IEnumerable).IsAssignableFrom(typeof(T)))
{
return CreateForAllMembers(includeInternals);
}
PlainTextFormatter<>.CreateForAllMembers is mis-named, since it actually selected members using GetMembersToFormat
COMMENT: this is a very strange part of the spec - this means registering a printer for an abstract base class, interface or generic non-abstract base class has different consequences to a non-generic non-abstract base class. This also explains why registering an "obj" printer doesn't do anything (it only selects type obj)
This point of differentiation is fairly recent, based on user requests. Originally, the formatters were only useful for types that can be instantiated. These should ideally be differentiated more clearly to show the intention behind "registering" a formatter for interfaces, abstract types, and open generics, i.e. that it's an on-demand fallback to be used when no formatter has yet been registered. The aggressively specific name RegisterLazilyForConcreteTypesOf
is the the start of a more explicit differentiation of these two use cases.
gives a situation where plaintext ToString() results get used as HTML:
This is a result of the fact that when nesting formatters, we currently fall back to text/plain
but preferably we would fall back to some form of HTML fragment. The text/html
formatters, which follow Jupyter's Python conventions for laying out tables, aren't appropriate for nesting.
@jonsequitur Cool thanks for the details.
Perhaps we can talk about this 1:1 on Monday, but I think I'd recommend this simple spec for choosing relevant formatters (or at least put it forward as a strawman).
Strawman spec for choosing a formatter (for mimeType and object actual type A):
If no mimeType is specified, determine one
Choose the most-specific user-registered mime type preference relevant to A
If none are relevant, then choose a default mime type.
Next, determine a formatter
Choose the most-specific user-registered formatter relevant to A
If none are relevant, then choose a default formatter (using the same rules with the built-in defaults).
Here "most specific" is in terms of the class and interface hierarchy. In the event of an exact tie in ordering or some other conflict, more recently registered formatters and mimeTypes are preferred. Type-instantiations of generic types are preferred to generic formatters when their GenericTypeDefinition are the same
The default sets of formatters for a mime type always include a formatter for object
.
Examples:
If the user registers a formatter for type A it is used for objects of type A (unless more formatters for type A are specified)
If the user registers a formatter for System.Object, it is preferred over all other formatters except other user-defined formatters
If the user registers a formatter for any sealed type, it is preferred over all other formatters (unless more formatters for that type are specified)
If the user registers List<>
and List<int>
formatters the List<int>
formatter is preferred for objects of type List<int>
If the user registers a confusing conflicting mess of overlapping formatters incrementally, they should Formatters.Clear() or restart the kernel.
If the user registers text/plain
as the mime type for object
then it is used as the mime type for everything (likewise any other mime type)
Evaluation implicitly leads to a default formatter registration which then leads to user-defined formatters being ignored. THis is very confusing.
Restart kernel
Evaluate
DayOfWeek.Monday
Register a formatter for IComparable
Evaluate
DayOfWeek.Monday
againExpected: The formatter for IComparable is invoked
Actual: the same formatting as step 1 is used
Compare with
Restart kernel
Register a formatter for IComparable
Evaluate
DayOfWeek.Monday
againIn this case, the formatter for IComparable is invoked. The fact that I've previously formatted a DayOfWeek value should be irrelevant.
The formatter architecture is very sniffy, it seems full of hidden state like this. Somehow evaluation causes default formatters to be populated and then those take precedence of user-specified formatters.
There should be a simple declarative spec like
"we run all registered user-defined formatters in sequence finding the first that matches, then run default formatters", or
"we look at all user-defined formatters and select the one that is closest match by type hierarchy. If none match we do the same with the default formatters"
I took a look at the implementation and TBH I saw some code fragments that indicate these type-indexed tables get automatically populated by defaults which I assume that leads to this kind of problem.