SirWumpus / post4

Post4 is an indirect threaded Forth dialect written in C.
BSD 2-Clause "Simplified" License
4 stars 1 forks source link

Example of cat-stdin and changing the input source #2

Closed ruv closed 1 year ago

ruv commented 1 year ago

In examples/cat.p4 a comment says:

Trivial example of how to do a cat(1) like pipe filter without any convoluted constructs as proposed in Gforth.

What convoluted constructs do you mean?

I see two variants of implementing such a word in Gforth:

  1. via raw standard read-file (as in the example, or via read-line in a similar way):

    : cat-stdin ( -- )
    begin  here dup unused stdin read-file throw dup while type repeat 2drop
    ;
  2. via execute-parsing-file:

    : cat-stdin ( -- )
    stdin [: begin refill while source type cr repeat ;] execute-parsing-file
    ;

Just a side note: save-input and restore-input cannot be used by a standard program in a way as your example, since save-input shall save, and restore-input shall restore only "a particular state of the input source, input buffer, and parse area", but not source-id and blk.

If source-id or blk is changed, restore-input may throw an exception due to an ambiguous condition: "An ambiguous condition exists if the input source represented by the arguments is not the same as the current input source".

SirWumpus commented 1 year ago

What convoluted constructs do you mean?

It has been a while (few years) since I researched how to support input redirection. I'd have to go back and revisit Gforth documentation and source (I never installed it or any other Forth) to clarify that statement.

I'll probably have to rework my cat(1) like example to accept file arguments from the command line for a better demonstration, but my overall post4 objective is to be able to redirect input source from stdin to a file (not just from the command line), so that ACCEPT, KEY, and REFILL do not notice a difference. Something like C's freopen(stdin, ...). I could not find anything like that in the Draft, so its on the todo list.

Where are stdin and execute-parsing-file words defined? I presume those are Gforth specific. I define mine as _stdin to make clear its non-standard (and subject to change).

As to SAVE-INPUT and RESTORE-INPUT, they talk about input source specification and their rationales do appear to make clear that switching input sources is a no-no (though that could be a consistency flaw in the Draft), but from Forth 200x Draft 19.1 section 2.1, note the last clause:

input source specification: 
A set of information describing a particular state of the input source, input buffer, and parse area.
This information is sufficient, when saved and restored properly, to enable the nesting of parsing
operations on the same or different input sources.

It talks about the ability of nesting of parsing operations on the same or ***different*** input sources, which means one needs to save the fileid (or in my case the input FILE *) in order to be able to properly switch between sources. Otherwise SAVE-INPUT / RESTORE-INPUT are almost no better than FILE-POSITION and REPOSITION-FILE. In an embedded environment the current standard might be sufficient, but for a hosted environment like post4 I don't see it being enough short of creating post4 specific words like SOURCE-SAVE and SOURCE-RESTORE (trying to avoid extra words if existing ones are sufficient). Probably will need a SOURCE-SET and SOURCE-GET too for the more generic input redirection. Then I'll need something for output redirection too.

If source-id or blk is changed, restore-input may throw an exception due to an ambiguous condition: "An ambiguous condition exists if the input source represented by the arguments is not the same as the current input source".

If I'm saving everything concerning the input source, like FILE *, then there will be no ambiguity since all the arguments, including FILE *, will refer to the same input stream on restore.

Thoughts?

ruv commented 1 year ago

to be able to redirect input source from stdin to a file (not just from the command line), so that ACCEPT, KEY, and REFILL do not notice a difference.

Yes. Many Forth systems allow such redirections only via the command line (shell-level redirection), for example as:

echo 'test passed' | forth -e 'cr here dup 8 accept type cr bye'

Something like C's freopen(stdin, ...)

An OS-dependent way is to open (or include) the file "/dev/stdin" in nix, or "con" in Windows, e.g.:

echo 2 3 + . cr bye | forth -e 'include /dev/stdin'

Where are stdin and execute-parsing-file words defined? I presume those are Gforth specific.

Yes, they are Gforth specific words, as I mentioned. For a standard Forth code there is no need to mention a particular Forth system at all.

I didn't check where these words are defined in the sources, but provided links to the corresponding sections in the Gforth manual:

I define mine as _stdin to make clear its non-standard (and subject to change).

We can consider a set of additional (non standard) words as a library. And it's inconvenient to prefix all words in a library with an underscore (apart from subject to change). I think it's better to put them in a separate vocabulary/namespace.

ruv commented 1 year ago

Concerning save-input and restore-input

It talks about the ability of "nesting of parsing operations on the same or different input sources", which means one needs to save the fileid (or in my case the input FILE *) in order to be able to properly switch between sources. Otherwise SAVE-INPUT / RESTORE-INPUT are almost no better than FILE-POSITION and REPOSITION-FILE.

Yes, you are right. It's an inconsistency between the term definition, and the specifications for words that use this term.

And the words are actually specified as an alternative to file-position and reposition-file for the current input source. NB: reposition-file is not allowed to be applied to a fileid from source-id in a standard program, see 11.3.3 Input source.

Actually, a standard system may provide this capability of switching source, but a standard program is not allowed to rely on this capability.

If I'm saving everything concerning the input source, like FILE , then there will be no ambiguity since all the arguments, including FILE , will refer to the same input stream on restore.

Then it will work in your system, but a standard program cannot rely on this capability since it will not work on some other standard system.

I don't see it being enough short of creating post4 specific words like SOURCE-SAVE and SOURCE-RESTORE (trying to avoid extra words if existing ones are sufficient).

SP-Forth/4 internally employs namely the words save-source and restore-source to save and restore the current input source, but these words don't change the state (the position) of the input source.

But why do you think that something like execute-parsing-file ( i*x sd.filename xt -- j*x ) is not enough for programs? I.e., do you really need sometimes to switch between several input sources in parallel?

ruv commented 1 year ago

It talks about the ability of "nesting of parsing operations on the same or different input sources",

It's an inconsistency between the term definition, and the specifications for words that use this term.

I was wrong, since the term "input source specification" is also used in specifications for load, included and evaluate to actually describe nesting on the same or different input sources.

SirWumpus commented 1 year ago

It talks about the ability of "nesting of parsing operations on the same or different input sources", which means one needs to save the fileid (or in my case the input FILE *) in order to be able to properly switch between sources. Otherwise SAVE-INPUT / RESTORE-INPUT are almost no better than FILE-POSITION and REPOSITION-FILE.

Yes, you are right. It's an inconsistency between the term definition, and the specifications for words that use this term.

I would think that the "input source specification" definition is normative, while the rationale is just informative describing historical practice. Draft 19-1 the definitions of SAVE-INPUT and RESTORE-INPUT do not state any weakening of the input source specification, only the rationale does that and being informative does not set required behaviour. (Sorry my standards lawyer hat sometimes fits a little snug, squeezes brain.)

I'd say implementations based on the description in the rationale are not standard, opting for historical behaviour.

Also I was considering my earlier new word ideas for input redirection could be more generic, like:

STREAM-GET ( stream -- fileid )
STREAM-SET ( fileid stream -- bool )
STREAM-PUSH ( stream -- )
STREAM-POP ( stream -- bool )
where stream = 0 input, 1 output

But why do you think that something like execute-parsing-file ( ix sd.filename xt -- jx ) is not enough for programs?

I think a program should be able to redirect its I/O. In the case of something like cat(1) you want to be able handle command line file arguments when present; one way is to open each file redirecting stdin to reuse the same code path. Of course other solutions are possible.

Redirecting stdout within a program would be handy too for something like output capture (into a dynamic string buffer maybe or temp. file) in a test suite that can then be examined. I doubt its possible with execute-parsing-file.

ruv commented 1 year ago

I would think that the "input source specification" definition is normative, while the rationale is just informative

You are right.

Draft 19-1 the definitions of SAVE-INPUT and RESTORE-INPUT do not state any weakening of the input source specification

The specification for RESTORE-INPUT says: "An ambiguous condition exists if the input source represented by the arguments is not the same as the current input source". And it's a normative part.

Due to this ambiguous condition, a standard program cannot use RESTORE-INPUT if the input source was switched, and a system is free to implement any behavior in this case.

A system shall obey to the normative parts as far as it can be tested by a standard program only. What is beyond that is up to the system. And ambiguous conditions restrict programs and relax systems.


I think a program should be able to redirect its I/O.

Agree. This topic should be further developed yet.

You suggest to set a fileid as the input source or output source. But, for example, I need to output not only to a file, but to a socket, or to a memory (ditto for the input). So a more general way is to set an xt that produces (for the input source) or consumes (for the output) data.

A common approach that is employed in Forth systems is to provide a way to get and set the implementation for TYPE, which is used by all output functions.

Concerning the input source. At the moment there is no a standard way to read the input source otherwise than line by line via REFILL or ACCEPT. But a program may have a need to read binary data (regardless of line terminators and control characters). This should be elaborated yet.

SirWumpus commented 1 year ago

The specification for RESTORE-INPUT says: "An ambiguous condition exists if the input source represented by the arguments is not the same as the current input source". And it's a normative part.

Well since we're talking about the Draft of the next standard, RESTORE-INPUT should be altered to remain consistent with the "input source specification".

My inclination is that all words referring to "input source specification" should be consistent, which makes RESTORE-INPUT the odd one out.


I think a program should be able to redirect its I/O.

Agree. This topic should be further developed yet.

Maybe this discussion would be better served in ForthHub Discussions so that others can follow / participate. (Maybe pointing to this as a kick-off).

My only concern is the Draft is already 10+ years old and appears to move at a dawdling pace. So proposed Draft changes might never arrive. Maybe as a companion annex (library) that could be developed and tested separately from the Draft would be best.

A common approach that is employed in Forth systems is to provide a way to get and set the implementation for TYPE, which is used by all output functions.

Setting up TYPE word pointer (and probably EMIT too) would handle most Forth output, except in some implementations where the core is implemented in C or other language, in which case one needs to has to handle the core output differently, which gets messy. Easier to simply redirect stdout so that all Forth code, core, and any libraries continue to work.

Socket support was always a long term goal. Memory map I/O would be nice too. But first post4 needs the basics to be there. If one can redirect stdin and stdout cleanly, then other sources and sinks should be easy to add.

ruv commented 1 year ago

which begs the question if it was an over sight or carry over (copy/paste without thought) from historical behaviour

It's not an oversight or carry over, it's so by design — what the rationale sections proves (A.6.2.2182 SAVE-INPUT).

The idea was that these words are capable to only reposition of the current input source, and not to switch it.

If you need only to switch the input source, you don't need to change its position, and then saving/restoring the position is an unnecessary overhead.

SirWumpus commented 1 year ago

If you need only to switch the input source, you don't need to change its position, and then saving/restoring the position is an unnecessary overhead.

In order redirect input or nest different inputs, you need to save a lot of input context (source-id, buffer ptr, size, length, offset, unget); file position comes with the fileid. Its not enough to just push/pop source-id.

SirWumpus commented 1 year ago

Following this discussion and ForthHub/discussion#134 I have a better handle on SAVE-INPUT and RESTORE-INPUT. I've revised that code and they now pass the Draft 19-1 related test cases.

But this still means that examples/cat.p4 needs a complete make over since SAVE-INPUT / RESTORE-INPUT are not what I thought they were, in which case I'll withdraw the example until I figure out either a portable or custom solution for program controlled I/O redirection.

@ruv Thank you for your feedback and insight.

ruv commented 1 year ago

since SAVE-INPUT / RESTORE-INPUT are not what I thought they were,

Yes, these words make wrong impression on the first glance. And they are almost useless for programs or libraries.

Probably, a more useful capability is to mark/extract a region of the input source, and then include/evaluate this region in the current context (regardless the current input source and without affecting it). By default the lifetime of a marked/extracted region can be limited by the lifetime of the parent input source, otherwise explicit releasing/closing can be required.

But such a capability is rarely demanded, so standard requirements can be limited to only allow implementing such a thing in a portable way.