Closed ruv closed 1 year ago
What convoluted constructs do you mean?
It has been a while (few years) since I researched how to support input redirection. I'd have to go back and revisit Gforth documentation and source (I never installed it or any other Forth) to clarify that statement.
I'll probably have to rework my cat(1)
like example to accept file arguments from the command line for a better demonstration, but my overall post4
objective is to be able to redirect input source from stdin to a file (not just from the command line), so that ACCEPT
, KEY
, and REFILL
do not notice a difference. Something like C's freopen(stdin, ...)
. I could not find anything like that in the Draft, so its on the todo list.
Where are stdin
and execute-parsing-file
words defined? I presume those are Gforth specific. I define mine as _stdin
to make clear its non-standard (and subject to change).
As to SAVE-INPUT
and RESTORE-INPUT
, they talk about input source specification
and their rationales do appear to make clear that switching input sources is a no-no (though that could be a consistency flaw in the Draft), but from Forth 200x Draft 19.1
section 2.1, note the last clause:
input source specification:
A set of information describing a particular state of the input source, input buffer, and parse area.
This information is sufficient, when saved and restored properly, to enable the nesting of parsing
operations on the same or different input sources.
It talks about the ability of nesting of parsing operations on the same or ***different*** input sources
, which means one needs to save the fileid
(or in my case the input FILE *
) in order to be able to properly switch between sources. Otherwise SAVE-INPUT
/ RESTORE-INPUT
are almost no better than FILE-POSITION
and REPOSITION-FILE
. In an embedded environment the current standard might be sufficient, but for a hosted environment like post4
I don't see it being enough short of creating post4
specific words like SOURCE-SAVE
and SOURCE-RESTORE
(trying to avoid extra words if existing ones are sufficient). Probably will need a SOURCE-SET
and SOURCE-GET
too for the more generic input redirection. Then I'll need something for output redirection too.
If source-id or blk is changed, restore-input may throw an exception due to an ambiguous condition: "An ambiguous condition exists if the input source represented by the arguments is not the same as the current input source".
If I'm saving everything concerning the input source, like FILE *
, then there will be no ambiguity since all the arguments, including FILE *
, will refer to the same input stream on restore.
Thoughts?
to be able to redirect input source from stdin to a file (not just from the command line), so that
ACCEPT
,KEY
, andREFILL
do not notice a difference.
Yes. Many Forth systems allow such redirections only via the command line (shell-level redirection), for example as:
echo 'test passed' | forth -e 'cr here dup 8 accept type cr bye'
Something like C's
freopen(stdin, ...)
An OS-dependent way is to open (or include) the file "/dev/stdin" in nix, or "con" in Windows, e.g.:
echo 2 3 + . cr bye | forth -e 'include /dev/stdin'
Where are
stdin
andexecute-parsing-file
words defined? I presume those are Gforth specific.
Yes, they are Gforth specific words, as I mentioned. For a standard Forth code there is no need to mention a particular Forth system at all.
I didn't check where these words are defined in the sources, but provided links to the corresponding sections in the Gforth manual:
I define mine as
_stdin
to make clear its non-standard (and subject to change).
We can consider a set of additional (non standard) words as a library. And it's inconvenient to prefix all words in a library with an underscore (apart from subject to change). I think it's better to put them in a separate vocabulary/namespace.
save-input
and restore-input
It talks about the ability of "nesting of parsing operations on the same or different input sources", which means one needs to save the fileid (or in my case the input FILE *) in order to be able to properly switch between sources. Otherwise
SAVE-INPUT
/RESTORE-INPUT
are almost no better thanFILE-POSITION
andREPOSITION-FILE
.
Yes, you are right. It's an inconsistency between the term definition, and the specifications for words that use this term.
And the words are actually specified as an alternative to file-position
and reposition-file
for the current input source. NB: reposition-file
is not allowed to be applied to a fileid from source-id
in a standard program, see 11.3.3 Input source.
Actually, a standard system may provide this capability of switching source, but a standard program is not allowed to rely on this capability.
If I'm saving everything concerning the input source, like FILE , then there will be no ambiguity since all the arguments, including FILE , will refer to the same input stream on restore.
Then it will work in your system, but a standard program cannot rely on this capability since it will not work on some other standard system.
I don't see it being enough short of creating post4 specific words like
SOURCE-SAVE
andSOURCE-RESTORE
(trying to avoid extra words if existing ones are sufficient).
SP-Forth/4 internally employs namely the words save-source
and restore-source
to save and restore the current input source, but these words don't change the state (the position) of the input source.
But why do you think that something like execute-parsing-file ( i*x sd.filename xt -- j*x )
is not enough for programs?
I.e., do you really need sometimes to switch between several input sources in parallel?
It talks about the ability of "nesting of parsing operations on the same or different input sources",
It's an inconsistency between the term definition, and the specifications for words that use this term.
I was wrong, since the term "input source specification" is also used in specifications for load
, included
and evaluate
to actually describe nesting on the same or different input sources.
It talks about the ability of "nesting of parsing operations on the same or different input sources", which means one needs to save the fileid (or in my case the input FILE *) in order to be able to properly switch between sources. Otherwise SAVE-INPUT / RESTORE-INPUT are almost no better than FILE-POSITION and REPOSITION-FILE.
Yes, you are right. It's an inconsistency between the term definition, and the specifications for words that use this term.
I would think that the "input source specification" definition is normative, while the rationale is just informative describing historical practice. Draft 19-1 the definitions of SAVE-INPUT
and RESTORE-INPUT
do not state any weakening of the input source specification, only the rationale does that and being informative does not set required behaviour. (Sorry my standards lawyer hat sometimes fits a little snug, squeezes brain.)
I'd say implementations based on the description in the rationale are not standard, opting for historical behaviour.
Also I was considering my earlier new word ideas for input redirection could be more generic, like:
STREAM-GET ( stream -- fileid )
STREAM-SET ( fileid stream -- bool )
STREAM-PUSH ( stream -- )
STREAM-POP ( stream -- bool )
where stream = 0 input, 1 output
But why do you think that something like execute-parsing-file ( ix sd.filename xt -- jx ) is not enough for programs?
I think a program should be able to redirect its I/O. In the case of something like cat(1) you want to be able handle command line file arguments when present; one way is to open each file redirecting stdin to reuse the same code path. Of course other solutions are possible.
Redirecting stdout within a program would be handy too for something like output capture (into a dynamic string buffer maybe or temp. file) in a test suite that can then be examined. I doubt its possible with execute-parsing-file
.
I would think that the "input source specification" definition is normative, while the rationale is just informative
You are right.
Draft 19-1 the definitions of
SAVE-INPUT
andRESTORE-INPUT
do not state any weakening of the input source specification
The specification for RESTORE-INPUT
says: "An ambiguous condition exists if the input source represented by the arguments is not the same as the current input source". And it's a normative part.
Due to this ambiguous condition, a standard program cannot use RESTORE-INPUT
if the input source was switched, and a system is free to implement any behavior in this case.
A system shall obey to the normative parts as far as it can be tested by a standard program only. What is beyond that is up to the system. And ambiguous conditions restrict programs and relax systems.
I think a program should be able to redirect its I/O.
Agree. This topic should be further developed yet.
You suggest to set a fileid as the input source or output source. But, for example, I need to output not only to a file, but to a socket, or to a memory (ditto for the input). So a more general way is to set an xt that produces (for the input source) or consumes (for the output) data.
A common approach that is employed in Forth systems is to provide a way to get and set the implementation for TYPE
, which is used by all output functions.
Concerning the input source. At the moment there is no a standard way to read the input source otherwise than line by line via REFILL
or ACCEPT
. But a program may have a need to read binary data (regardless of line terminators and control characters). This should be elaborated yet.
The specification for RESTORE-INPUT says: "An ambiguous condition exists if the input source represented by the arguments is not the same as the current input source". And it's a normative part.
Well since we're talking about the Draft of the next standard, RESTORE-INPUT
should be altered to remain consistent with the "input source specification".
SAVE-INPUT
does not weaken "input source specification", but RESTORE-INPUT
does, clearly a consistency conflict.RESTORE-INPUT
is the only word that weakens "input source specification", which begs the question if it was an over sight or carry over (copy/paste without thought) from historical behaviour, or never really considered in the context of input redirection in hosted environs.My inclination is that all words referring to "input source specification" should be consistent, which makes RESTORE-INPUT
the odd one out.
I think a program should be able to redirect its I/O.
Agree. This topic should be further developed yet.
Maybe this discussion would be better served in ForthHub Discussions so that others can follow / participate. (Maybe pointing to this as a kick-off).
My only concern is the Draft is already 10+ years old and appears to move at a dawdling pace. So proposed Draft changes might never arrive. Maybe as a companion annex (library) that could be developed and tested separately from the Draft would be best.
A common approach that is employed in Forth systems is to provide a way to get and set the implementation for TYPE, which is used by all output functions.
Setting up TYPE
word pointer (and probably EMIT
too) would handle most Forth output, except in some implementations where the core is implemented in C or other language, in which case one needs to has to handle the core output differently, which gets messy. Easier to simply redirect stdout so that all Forth code, core, and any libraries continue to work.
Socket support was always a long term goal. Memory map I/O would be nice too. But first post4
needs the basics to be there. If one can redirect stdin and stdout cleanly, then other sources and sinks should be easy to add.
which begs the question if it was an over sight or carry over (copy/paste without thought) from historical behaviour
It's not an oversight or carry over, it's so by design — what the rationale sections proves (A.6.2.2182 SAVE-INPUT).
The idea was that these words are capable to only reposition of the current input source, and not to switch it.
If you need only to switch the input source, you don't need to change its position, and then saving/restoring the position is an unnecessary overhead.
If you need only to switch the input source, you don't need to change its position, and then saving/restoring the position is an unnecessary overhead.
In order redirect input or nest different inputs, you need to save a lot of input context (source-id, buffer ptr, size, length, offset, unget); file position comes with the fileid. Its not enough to just push/pop source-id.
Following this discussion and ForthHub/discussion#134 I have a better handle on SAVE-INPUT
and RESTORE-INPUT
. I've revised that code and they now pass the Draft 19-1 related test cases.
But this still means that examples/cat.p4
needs a complete make over since SAVE-INPUT
/ RESTORE-INPUT
are not what I thought they were, in which case I'll withdraw the example until I figure out either a portable or custom solution for program controlled I/O redirection.
@ruv Thank you for your feedback and insight.
since
SAVE-INPUT
/RESTORE-INPUT
are not what I thought they were,
Yes, these words make wrong impression on the first glance. And they are almost useless for programs or libraries.
Probably, a more useful capability is to mark/extract a region of the input source, and then include/evaluate this region in the current context (regardless the current input source and without affecting it). By default the lifetime of a marked/extracted region can be limited by the lifetime of the parent input source, otherwise explicit releasing/closing can be required.
But such a capability is rarely demanded, so standard requirements can be limited to only allow implementing such a thing in a portable way.
In examples/cat.p4 a comment says:
What convoluted constructs do you mean?
I see two variants of implementing such a word in Gforth:
via raw standard
read-file
(as in the example, or viaread-line
in a similar way):via
execute-parsing-file
:Just a side note:
save-input
andrestore-input
cannot be used by a standard program in a way as your example, sincesave-input
shall save, andrestore-input
shall restore only "a particular state of the input source, input buffer, and parse area", but notsource-id
andblk
.If
source-id
orblk
is changed,restore-input
may throw an exception due to an ambiguous condition: "An ambiguous condition exists if the input source represented by the arguments is not the same as the current input source".