JacquesCarette / Drasil

Generate all the things (focusing on research software)
https://jacquescarette.github.io/Drasil
BSD 2-Clause "Simplified" License
141 stars 26 forks source link

Implement choices for input format #1716

Open bmaclach opened 5 years ago

bmaclach commented 5 years ago

Input format should be a design variability that Drasil offers some control over. Currently Drasil forces this kind of input file format (i.e. junk lines interspersed with lines containing a single value):

#LENGTH (in mm)
1.6
#BREADTH (in mm)
1.5
#WEIGHT OF CHARGE (in kg)
10
...

We have a data description language in Drasil (DataDesc) but I think having users define their own input formats using that language is offering them more flexibility than necessary. I think the better option is to automatically generate a DataDesc based on some high-level choices offered by Drasil. These choices would be:

What I mean by interwoven lists is this: If we have inputs x and y which are both lists of size n, and interwoven lists are turned on, with an inter-value separator as newline and inter-list element separator as comma, the input format would be:

x_1, y_1
x_2, y_2
...
x_n, y_n

And with interwoven lists turned off:

x_1, x_2, ..., x_n
y_1, y_2, ..., y_n
smiths commented 5 years ago

I like your idea @bmaclach. Your brainstormed list of variabilities seems like a good start to me.

Interestingly, I think your example text file is probably wrong. It looks like it is from GlassBR. The length of the window is 1.6 mm. I'm sure this is a consequence of our change from mm to m in the input. We changed it everywhere, but here. This suggest to me that the text of these "comment" lines are hard-coded somewhere. Would it be possible to automatically generate them, like we do description lines in the SRS?

JacquesCarette commented 5 years ago

My gut feel is that we're going to need two levels:

  1. a nice user-oriented way to specify input format (what is being suggested here)
  2. a more computer-oriented internal 'format description language' (which is what was implemented before).

I don't see this as incompatible. The only thing "wrong" is that we exposed an inner language.

bmaclach commented 5 years ago

@JacquesCarette I definitely agree. I didn't mean that we would remove our data description language, only that we would have some higher-level choices on top of it.

JacquesCarette commented 5 years ago

Good - I wasn't sure. So I think we're all on the same page.

bmaclach commented 5 years ago

@smiths You are right that the comments in the input file are hardcoded - in fact, the entire file is hardcoded. Drasil does not generate input files. But I like the idea that it should.

Generating the comment lines based on the inputs in Drasil is very doable, and we could generate the input values based on the typical values entered in Drasil. Plus, with #1728 I've added the ability to generate non-source code files as part of a GOOL Package, so generating a sample input file should be easier now. I created a separate issue for this (#1755).

smiths commented 5 years ago

An additional "input format" variability that we (@bmaclach, @JacquesCarette and myself) discussed on August 29 was the choice of file format as binary or text (ASCII/Unicode). We thought this might be a feasible addition to the input format variabilities.

For completeness, we also discussed the variability of switch input from a file stream to a keyboard input stream. Although technically possible, this variability is felt to be too much work at this time. Moreover, it might be nice as an example, but very few (if any) real (non-toy) scientific computing programs are going to require keyboard input.

JacquesCarette commented 4 years ago

@bmaclach is there anything left to do here? [I'm sure lots of extensions could be implemented, but you've redesigned and implemented this, right?]

bmaclach commented 4 years ago

This remains unimplemented because the new DataDesc design (#1835) still is not implemented.

balacij commented 1 year ago

The ticket is largely about 'parsing'. We have one complex data type (an input configuration set) that we want to parse, and we can parse it in various flavours: JSON, YAML, TOML, TXT, CSV, etc. Additionally, within those flavours, we also have further flavours for parsing (for example, if we were parsing Expr, it can be represented in multiple ways across JSON, YAML, TXT, etc.!).

Ultimately, this is quite complicated :smile: