gelisam / hawk

Haskell text processor for the command-line
Apache License 2.0
361 stars 20 forks source link

magic rules #33

Open gelisam opened 11 years ago

gelisam commented 11 years ago

The current code considers the uses at the top of this list to be more common than the uses at the bottom:

  1. manipulate the input as a table ([[]] -> )
  2. manipulate the input as a list of lines ([] -> )
  3. apply the expression to each line ( -> )
  4. evaluate the expression, ignoring the input (_)

So when magic encounters a case where what the user wants to do is ambiguous, it pick the most likely candidate. Perhaps there could be a --verbose mode in which we print to stderr which other interpretations would have made sense, and tell the user which flags they could use to force that interpretation?

gelisam commented 11 years ago

Let's list other actions which the user might want to perform. Here are those I can think of, in no particular order:

  1. manipulate the input as a table ([[]] -> )
  2. manipulate the input as a list of lines ([] -> )
  3. apply the expression to each row ((, , ...) -> _)
  4. apply the expression to each parsed line (Read a => a -> _)
  5. apply the expression to the entire stream (ByteString -> _)
  6. apply the expression to parse the entire input (say, a multiline json expression) (Read a => a -> _)
  7. use the expression to filter the lines (_ -> Bool)
  8. use the expression to filter and transform the lines (_ -> Maybe a)
  9. fold all lines into one value (Monoid a => a -> _ -> a)
  10. fold all lines into one value, failing if there are no lines (a -> a -> a)
  11. apply the expression to each line ( -> )
  12. evaluate the expression, ignoring the input (_)

I'm not sure which of those will be more common.

I would also like to use a bit of magic on the output type, subsuming the current Representable rules:

  1. display the output as a table ([[_]])
  2. display the output as a table ([(, , ...)])
  3. display the output as a list of lines ([_])
  4. display the output directly (IsString a => a)
  5. display the output using show (Show a => a)
  6. run the output as an IO computation (IO _)
melrief commented 11 years ago

I would like to answer to the first post first. Let's remember that, for now, the types are more specific on the "internal type":

  1. table: the type of a function working on a table must be [[ByteString]] -> _
  2. lines: the type of a function working on lines must be [ByteString] -> _
  3. stream: the type of a function working on the stream must be ByteString -> _

another thing is that we have two delimiters, for lines and for words, that decide which mode should be used. So what you showed before should be written:

> echo '12 34\n56 78' | hawk -d'' B.reverse
43 21
87 65

and is equivalent to:

> echo '12 34\n56 78' | hawk 'L.map B.reverse'
43 21
87 65

Instead, if you call hawk B.reverse what I expect is a mapping of mapping into the table because by default the two delimiters are set to ' ' for words and '\n' for lines:

> echo '12 34\n56 78' | hawk B.reverse
21 43
65 87

that is equivalent to:

> echo '12 34\n56 78' | hawk 'L.map (L.map B.reverse)'

I do agree with the fact that hawk must be as friendly as possible, but let's do it in a consistent way. If the two delimiters are set then we work on table mode and we fold/filter/map on it the user expression. If only one delimiter is set then we work in line mode and we fold/filter/map on lines the user expression. Else we work on the stream. If the type is not coherent with the delimiters for some reason then we should exit with error:

> echo '12 34\n56 78' | hawk -D'' -d'' 'L.map B.reverse'
error: cannot use ..
actual: L.map :: (a -> b) -> [a] -> [b]
expected: ByteStrings -> a

What do you think?

gelisam commented 11 years ago

I am not convinced. Why would magic be allowed to infer -m and -a, but not -d''? If the user writes an expression of type [[ByteString]] -> Int, it is very clear that the user wants to parse the input as a table, but if the expression has type [ByteString] -> Int, then it could either be because the user wants to process the input as a list of lines (hawk -a -d'') or because the user wants to parse each line as a list of fields and apply his expression to each line (hawk -m). I would be open to discuss which use case might be more common, but I don't think that unilaterally outlawing the inference of -d'' and -D'' will be very convenient for the user.

If the user already specifies -D'' -d'', however, I agree that magic should not be allowed to override that choice, so your last example should indeed be a type error.

melrief commented 11 years ago

Yeah probably you are right on -d'' and -D'' ... I don't know, I really like magic but I would like to have something consistent. I don't have enough informations to decide and discuss, what I can say are merely opinions.

By the way I like magic and I still think it should replace evaluate mode as standard mode (also because it can act has evaluate :)). Keeping -a and -m with the delimiters will allow user to enforce the correct application style when magic doesn't do what expected by the user. But for everything else, magic will make Hawk far more easy.

I agree on -v/--verbose for informations on the expression. I propose to print to stderr the mode deduced by magic, the type of the user expression and the expression itself.

gelisam commented 11 years ago

By the way I like magic and I still think it should replace evaluate mode as standard mode

I really like magic too! This is the most interesting part of this project for me, but also the part in which I am the least open to compromise, because I view magic as my personal research playground. I plan to try things, change things, and since this is still at the research stage, to change my mind and break things often.

Perhaps we could get inspiration from ghc's approach to experimental features, and leave them off by default? For the first release, at least, using hawk with no arguments should use eval mode. Then, as we implement crazier and crazier magic, we could use the LANGUAGE pragma to turn on a subset of the magic features, see how they interact, and deprecate bad ideas. Also, when we have a disagreement about how magic should behave, this would allow us to implement both versions and see which one is most useful in practice.

I propose to print to stderr the mode deduced by magic, the type of the user expression and the expression itself.

Sure. Any particular reason why you want to print the user's expression? Unlike the two other pieces of information, the user already knows that piece.

melrief commented 11 years ago

This is the most interesting part of this project for me, but also the part in which I am the least open to compromise, because I view magic as my personal research playground.

I know magic is yours and respect your position, so no problem if you answer "NO!" when I propose something. I will propose my ideas anyway but you are the one who will decide how to do magic. That's it. Consider that I like it so I will use it and consequently propose my ideas. Can't stop this :D.

Then, as we implement crazier and crazier magic, we could use the LANGUAGE pragma to turn on a subset of the magic features, see how they interact, and deprecate bad ideas.

The idea to use extension to enable certain features is fantastic! Let's implement this strategy so we are then free to add/remove any extension! That's just perfect!

Sure. Any particular reason why you want to print the user's expression? Unlike the two other pieces of information, the user already knows that piece.

The user knows its expression, not the full expression that is evaluated with type IO (). Hawk does some processing on the user expression and I think that output the final produces string could be useful for the user. Sadly I fear the expression wont be very readable, but still can be used to debug problems.

gelisam commented 11 years ago

Consider that I like it so I will use it and consequently propose my ideas. Can't stop this :D.

And I welcome your feedback!

I know magic is yours and respect your position, so no problem if you answer "NO!" when I propose something.

Now that we have agreed to separate magic features using language extensions, I am far less likely to oppose anything, because it will only interfere with my other ideas if we enable both extensions at once.

Hawk does some processing on the user expression and I think that output the final produces string could be useful for the user.

I think it will mostly be useful for us! Which, currently, is the same as saying it's useful for our users :)