Formatting for interpreter output

EclecticGriffin commented 2 years ago

An open question about how the interpreter should decide to interpret values for printing. Since the test suite for the compiler includes signed and fixed point interpretations the interpreter's output doesn't line up as it naively prints the values as an unsigned integer.

There are a few ways I can currently think about how to approach this.

Add annotations to ports / memories to indicate that they are signed/unsigned, bitnum/fixed_point
Attach this information to the data file that the interpreter ingests so that the information is only carried to memories when values are supplied (making use of the existing information in the data files). The downside is that this doesn't carry to ports in general, though this may not be a major problem given that we usually only look at the output value of memory

cgyurgyik commented 2 years ago

Stupid idea no. 3 for simplicity: always print all 4 values, i.e. {unsigned, signed} {bitnum, fixed point}?

Also seems connected to Overlay type system idea.

sampsyo commented 2 years ago

Indeed! Option 1 above is definitely a simple version of that "overlay type system" idea.

I agree that Option 2 above would only apply to the "final output" and not general interactive printing during debugging, for example, because it wouldn't really be practical to use this to get the intended types fo individual ports. Maybe this is a silly question, but doesn't our little JSON "memory data" file format already include type information? Like this stuff in the JSON input for a dot product test: https://github.com/cucapra/calyx/blob/f3b9cf6ec847812ad893a31a4f26d0eea1c89479/tests/correctness/unsigned-dot-product.futil.data#L22-L26

One could imagine a post-processing step that takes the raw binary data and formats it as numbers according to the types given in such a file.

@cgyurgyik's Option 3 seems very practical as well!

cgyurgyik commented 2 years ago

One could imagine a post-processing step that takes the raw binary data and formats it as numbers according to the types given in such a file.

I think that's what their Option 2 is claiming, or at least how I interpreted it. However I foresee issues where this may still be incomplete for reading, say, register values where there may be no type information.

EclecticGriffin commented 2 years ago

Yes, for what I wrote as option 2 I was imagining propagating that information to the memories in the interpreter which would allow them to display the values with the appropriate interpretation. As @cgyurgyik points out, this doesn't really solve the interpretation problem for anything other than memories, though the vast majority of our tests only look at the memory outputs so this is perhaps not the worst state of affairs. It does mean that there would be no type information whatsoever for the program if the memories are not given external initialization

sampsyo commented 2 years ago

Makes sense! It does seem like either Option 3 or annotations are kind of the low-hanging fruit, huh? Propagating information from the data file to print out at run time seems like both a little more fiddly and also less future-proof (because it can't apply to non-memories).

cgyurgyik commented 2 years ago

Accidentally closed. #744 only adds fixed point interpretation of Value. Still plenty left to do here.

rachitnigam commented 2 years ago

@EclecticGriffin does this still discussion or have we committed to using "type annotations" generated from frontends.

EclecticGriffin commented 2 years ago

We haven't really committed to a solution yet. There's merit to all of the suggested solutions but if we want the minimum viable "solution", the shortest path to that probably involves just propagating the type from the data file and automatically formatting the output memory accordingly

rachitnigam commented 2 years ago

propagating the type from the data file and automatically formatting the output memory accordingly

Does the interpreter read the JSON files provided to it or does it use a different format that fud generates for it? If it's the latter then doing this would require yet another input/changing the input schema to use data files with the interpreter.

While fud should be used to make the interface to our tools simple, it should always be possible to use them on their own with a simple-ish CLI.

EclecticGriffin commented 2 years ago

Currently the interpreter reads a modified JSON produced by fud which is aggressively inefficient (and kinda silly), it's just a map from id names to flat lists of bit strings which are then instantiated literally. The interpreter has a more robust data translation than it used to, so this can probably be made easier by either having the interpreter take in the exact same data files (sans translation) as currently used for Verilog and do the conversion internally. This of course runs into the problem that it won't work for non-supported data types.

Ideally there'd be an extensible solution to these data representation quandaries which makes me feel like fud is probably the right place overall to do this marshaling

sampsyo commented 2 years ago

I think it would be cool to chat synchronously about this sometime! But my rough thoughts here are:

In the long term, this is a single facet of the larger problem of representing fontend (i.e., DSL) datatypes in Calyx for formatting, which is itself a facet of an even larger problem about source-level debugging in CIDR (e.g., tying PCs to statements in the original source language).
In the short/medium term, it does seem like a good solution would be a simple set of annotations. Just mark a signal with @datatype(fixed, 8, 3) (to completely make up probably-bad syntax!!!) to get it to format that way when printed.
That seems preferable to me, as a short-term fix, to attempting to read type information from the data files. It can avoid changing the input data format altogether for now, and (independent of the specific format) it can avoid changing the broader principle that input data is just a plain blob of binary data. So it leaves the door open to a future where the interpreter loads actual binary blobs, sans any manner of type information whatsoever, which may be the Platonic ideal for data loading. (Conversion would be left completely to external tools, which can be neatly shared with RTL execution.)

rachitnigam commented 2 years ago

Ideally there'd be an extensible solution to these data representation quandaries which makes me feel like fud is probably the right place overall to do this marshaling

Not sure I follow this. The extensibility question needs to be answered at the CIDR level anyways. It seems wrong to bake this into fud which is supposed to a driver for tools and not necessary do load-bearing tasks. For example, if you want to run the interpreter in debug mode, the source-level data type information needs to exist in interpreter anyways.

In the short/medium term, it does seem like a good solution would be a simple set of annotations. Just mark a signal with @datatype(fixed, 8, 3) (to completely make up probably-bad syntax!!!) to get it to format that way when printed.

We should be a little careful in defining this format so we can use it with the overlay type system in the future as well.

rachitnigam commented 2 years ago

@sampsyo suggests using fud to do the data marshaling. This issue currently blocks #714

rachitnigam commented 2 years ago

@EclecticGriffin does #942 fix this?

EclecticGriffin commented 2 years ago

Ah yes, I must have missed this one when I was trying to link the issues

calyxir / calyx

Formatting for interpreter output #733