F# values display badly in default formatting, esp plaintext

dsyme commented 4 years ago

Therer are lots of cases where default formatting for F# values is way off. I'll document a few here and try and make progress on this.

Evaluate a method info, e..d

match <@ display("") @> with Quotations.Patterns.Call(,mi,) -> mi

Expect: something reasonable Actual: a big mess

Evaluate "<@ 1 @>" in a notebook.

Expect: Some reasonable of the quotation.
Actual, a big mess, see image at bottom

Best would really be just the ToString() I think:

Value (1)

For reference FSI shows this, which is not great but not awful.

   val it : Quotations.Expr<int> =
 Value (1)
{CustomAttributes = [NewTuple (Value ("DebugRange"),
      NewTuple (Value ("stdin"), Value (1), Value (3), Value (1), Value (4)))];
 Raw = ...;
 Type = System.Int32;}

Quotation image:

dsyme commented 4 years ago

@brettfo @KevinRansom I'd like to iterate on the design here.

I know we have extensible type-direct formatting through Formatter<_>.Register(f) but we need to get the basic default printing of a whole swathe of F# data values to be decent without use customization.

Here's a full list of values we need to consider, I'll update these as I think of more, and start to cross them off as I check current status

[x] Constants (integers, floating point, strings, decimal)
[ ] Constants from .NET libraries (bigint, DateTime, TimeSpan)
[ ] Tuples
[ ] Struct tuples
[ ] Records
[ ] Single case unions without data
[ ] Single case unions with data
[ ] Multi case unions all without data
[ ] Multi case unions some with data
[ ] Enums
[ ] .NET Object values from core libraries (e.g. Type, MethodInfo, Assembly)
[ ] Lists of all of the above
[ ] Arrays of all of the above
[ ] Sequences of all of the above
[ ] Sequences that are slow to compute
[ ] Nested cases of the above (where applicable)
[ ] Quotations (raw and typed)
[ ] Function values (unapplied and partially applied)
[ ] Async values
[ ] Task values
[ ] Lazy values
[ ] IDictionary values
[ ] ICollection values
[ ] Map values
[ ] Set values
[ ] dict values
[ ] values returned by List.groupBy, Seq.groupBy and so on

Then a set of things from important libraries which will need customization

[ ] Data Frames
[ ] Charting
[ ] Tensors
many other examples

dsyme commented 4 years ago

FWIW I think the best approach here in the short term would be "revert to printing plain text of F# values unless we are sure the result is going to look nice"

And use https://github.com/dotnet/interactive/issues/646 to control printing in plain text.

Exactly what this means in detail I'm not sure

dsyme commented 4 years ago

@jonsequitur and I had an adhoc chat mostly about plaintext formats and went through the existing structured plaintext formattting mechanisms in F#. Eventually the notes got turned into

https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/plaintext-formatting

Questions for us about plaintext

What is the gesture for plaintext?
- Current: mimetype
- F# has other options like sprintf "%A"
- Could add "display_text(x)"
What are the defaults for plaintext?
What is the extension author viewpoint for F#?
- Typical case study: XPlot, nteract data explorer
- We would really like the extension author not to have to think about F#
- Equally if their extension needs to resort plaintext at "tips", then it may be sensible to allow them to get consistency of plaintext display at those tips. (So encourage extenstion authors to recurse rather than calling ToString at the tips)
What is the library author's viewpoint? Do they implement StructuredDisplayFormat attribute? ToString? DebuggerProxy etc? What's a "good" library look like that gives good structured plain text. Case study: DiffSharp tensors

Questions that came up about HTML display

How many columns maximum?
How to populate the properties by default? Only .NET metadata? Which special cases? PropertyBag cases?
How about heterogeneous data?
When do we know the HTML is actually better or at least no worse?
Is what happens for leaf data consistent with plain text? e..g numeric formats? Do we care if there are differences?
When do we use plaintext at tips of the HTML structure? How are those values formatted?
```
e.g. sprintf "%A" is an option, but we can also add tweaks to this
```

Don also had a question about How to set the default mimetype

dsyme commented 4 years ago

@jonsequitur @KevinRansom @cartermp I have added a page to the F# docs on F# structured plaintext formatting, see

Pull request: https://github.com/dotnet/docs/pull/19632
New doc page
Additions to F# Interactive docs

dsyme commented 4 years ago

As part of looking at this issue I'm doing a deep dive on what the current spec for Formatting actually is, see https://github.com/dotnet/interactive/issues/692#issuecomment-670918750 for my mini-spec

dsyme commented 4 years ago

I've been starting to work with build where F# notebooks default to F# Interactive structured plaintext formatting, building on #694

Here's an example:

This is definitely the correct default for F# for when plain text formatting is selected (even if HTML is the default mime type)

One thing I just noticed is the left/right justification diference with HTML formatting. Interesting how different the visuals are

I'm now going to go through the big list of items up above and check that the F# defaults are good for these, both in HTML and plain text.

gbaydin commented 4 years ago

As a daily Jupyter user working mainly with Python (and now F# thanks to all this awesome work in dotnet interactive!), I would like to provide my feedback.

I think that just having the simple string representation of objects (the result of .ToString()) is all we need for anything except images or very rare cases where a library developer has a custom formatting (e.g., Pandas in Python does this for its DataFrame type).

For example, I'm not used to seeing some html table representation when I have a simple tuple or a basic data type. I would be very happy to see a simple string instead of this. The second option below is much more familiar and more comfortable.

I think things you may consider a nice layout and visual aid when looking at simplistic examples end up looking unfamiliar/intrusive/broken in practice. For example, in the following case I just need to see a tuple of two tensors (one is a matrix of size 28x28 and the other a scalar), but I end up with this table for the tuple and within the table with an awkward representation of the F# type showing object properties.

When I just have the string, I get to see a tuple and the tensor value within:

This is feeling much more at home for me as a Juypter user and it is definitely more productive and easier to work with. I hope we can have this as the default for everything.

gbaydin commented 4 years ago

I especially think the idea of printing object properties is a bad idea. A notebook user is not interested in seeing the list of properties of a given object. They just need a quick glance at the string representation (.ToString()) which will cover the essential information. If the string representation is not very useful, this is also fine.

Here's the default behavior in Python:

jonsequitur commented 4 years ago

Thanks, @gbaydin. For the most part these simpler outputs are our goal, so you can expect to see changes here to use the F# printers by default.

The current defaults are based on having wanted to provide something useful for the more complex objects and data structures. Your Tensor.ToString() example works because the method was written to accommodate this use case, but typically .NET collection types don't expose their data via ToString.

Another goal is to have formatter functionality able to provide defaults for types that apply across languages. But since F# has a powerful printer infrastructure which C# lacks, our next step is to allow the defaults to differentiate until explicitly overridden.

dsyme commented 3 years ago

The current defaults are based on having wanted to provide something useful for the more complex objects and data structures. Your Tensor.ToString() example works because the method was written to accommodate this use case, but typically .NET collection types don't expose their data via ToString.

The problem is, showing properties by default doen't succeed at the stated goal either - it's just too often not useful and very cognitively overloading. It's a good aim but it's not the answer.

I believe .NET objects (unqualified) simply don't have a good default for revealing their expanded structure, either internal or external, besides ToString() (with some expansions for known types like dictionary, IEnumerable). This is partly because the whole object paradigm is so often used for relational data, using properties to create links to other very large objects.

Anyway we can think this over again for F#, and perhaps for C# too

jonsequitur commented 3 years ago

My current thinking is that we can allow this decision to be different per subkernel by adding support for subkernels specifying their own formatting rules, so for example the FSharpKernel could prefer FSI-style printing. C# doesn't have a comparable mechanism and perhaps because of that, the feedback has been that the tabular output is more useful than ToString for most types.

In the short term this should make the F# experience better and more familiar while decoupling from the timeline for finding a better set of defaults for C#.

jonsequitur commented 3 years ago

I believe .NET objects (unqualified) simply don't have a good default for revealing their expanded structure, either internal or external, besides ToString() (with some expansions for known types like dictionary, IEnumerable). This is partly because the whole object paradigm is so often used for relational data, using properties to create links to other very large objects.

@dsyme Thoughts on #1233 as an approach to object graphs generally?

jonsequitur commented 3 years ago

Anyway we can think this over again for F#, and perhaps for C# too

I've seen that there are a few different use cases for printing output. At the risk of overthinking things, these deserve some discussion. @dsyme, when you and I talked about this last year, we realized that our current display gestures (via return value or the IPythonesque display helper) don't express intent. We're dealing with a few different intents. Lets try to make them explicit.

@gbaydin, what I understand by the following comment is that you understand the domain well and so only need a summary:

When I just have the string, I get to see a tuple and the tensor value within: image This is feeling much more at home for me as a Juypter user and it is definitely more productive and easier to work with. I hope we can have this as the default for everything.

For the sake of discussion, let's call this use case Summarize. ToString incidentally suffices, but only because the library author's intent for ToString happened to match yours.

When someone is doing more exploratory work (debugging, learning) they might want a more expansive view. We see this in Visual Studio watches (though they're text only and not moldable for types you haven't defined).

In .NET Interactive, we have examples like these:

We referred to this set of use cases as Explore, though there are probably different sub-cases.

Here are examples of Summarize and Explore for System.Type:

Notice that while this Explore gesture (ExploreWithUmlClassDiagram) is intention-revealing and explicit, when we display return values we have to make assumptions about intent. I'm inclined to agree with you @gbaydin that this should default to the minimalistic Summarize. But maybe there's also a default intent at the notebook level. Is it a data science notebook that's describing a model to expert users, or is it a documentation notebook that's explaining new concepts to learners, or is it a troubleshooting or performance analysis notebook that's trying to capture a high level of detail for engineers? The MIME types "text/plain" and "text/html" don't capture these nuances.

dsyme commented 3 years ago

When someone is doing more exploratory work (debugging, learning) they might want a more expansive view.

Do we have any evidence about how often users want more expansive/structured views?

It feels to me a safer route is to follow the example of Python and other REPLs and have a simple and easy way to default to plaintext. #plaintext would do nicely for example, or make it the default, and require an explicit view action.

I'm inclined to agree with you @gbaydin that this should default to the minimalistic Summarize.

Yes, agreed. Perhaps plaintext with a little "explore" icon next to the printed value. But in either case the actual "explore" experience needs to be 10x better than it is today - at least as good as the Visual Studio debugger and indeed much better than that

But maybe there's also a default intent at the notebook level.

Yes, agreed

The MIME types "text/plain" and "text/html" don't capture these nuances.

Right

KevinRansom commented 3 years ago

@dsyme we should create a backlog item for this. I particularly like the categorization of types, it's very helpful. I think though this applies to more than just F#, although printf "%A" might make F# developers expect better than we currently produce.

gbaydin commented 3 years ago

@jonsequitur thank you for the summary above. I understand what you're trying to achieve and coming up with a good design is important for the long term.

I think the current defaults in F# notebooks are very close to a GUI debugger experience, Explore in your description above, and very far from a conventional notebook experience in ecosystems like Python/Jupyter. I think it should instead default to plaintext Summarize as you and @dsyme were talking about above.

The typical experience in Python/Jupyter/Colab etc is actually very close to a console script execution experience, but you get the benefit of cell-based execution and ordering. Html formatters are almost never seen in practice. The user expects to see a very easy-to-look-at and simple output to get the essential information in the most compact way possible (as I was trying to explain here). If they need debugging, they would also do this via plaintext by the way, printing various properties of the object, etc. Non-plaintext things are seldomly seen and the most typical examples are images (including plots) and some libraries like Pandas that use an html formatter to show tabular database objects (data with column names, row ids, etc.)

This Colab notebook is an example of the typical experience I'm describing above: https://colab.research.google.com/github/jckantor/CBE30338/blob/master/docs/01.01-Getting-Started-with-Python-and-Jupyter-Notebooks.ipynb (also a video here https://www.youtube.com/watch?v=HW29067qVWk )

I would like to bring these two screenshots from the issue #1282 I opened in a moment of frustration yesterday. Please note this is affecting very simple F# types like lists and tuples, not it's not just about specialized libraries with a useful ToString implementation.

Screenshot with bad user experience (the current default)

Screenshot with acceptable user experience (after manually having to change the default F# formatter)

jonsequitur commented 3 years ago

Do we have any evidence about how often users want more expansive/structured views?

Yes, though not yet in a formal user study. In both the educational and developer productivity spaces, this feature has gotten a lot of positive feedback. A common theme from users is that this moldable HTML output is a reason to use notebooks over the existing REPL experiences. People working in education and documentation, going back to Try .NET, have asked for these capabilities in order to create more engaging experiences than what's possible with plain text.

That said, the goal is to be adaptable to different workflows. I don't think there's a one size fits all answer, so we'd like to figure out how to make it easy for people to adapt the tool to their needs.

nhirschey commented 3 years ago

In general, I am also very supportive of the idea that the default fsi text printers are preferable for people coming to notebooks from a data science background.

However, I do prefer the .net interactive printing for things like arrays of (simple) records, where the html printer does a better job making the collection of records look like a "data frame". When working with an F# script i often do array |> Array.iter (printfn "%A") because the default printer goes side to side across the terminal which makes it hard for me to read when the collection and/or number of fields in the record is big. In these 3 examples, I think that default fsi is the worst:

Also, @gbaydin 's workaround for text printing breaks Plotly.NET's custom plot printer. It makes it impossible to include Plotly plots in a .net notebook. Whomever is working on moving to the fsi printer as default, please be mindful that custom printers do not get overwritten.

dotnet / interactive

F# values display badly in default formatting, esp plaintext #642

Questions for us about plaintext

Questions that came up about HTML display