gelisam / hawk

Haskell text processor for the command-line
Apache License 2.0
361 stars 20 forks source link

#171 in place edit #179

Open Rahel-A opened 6 years ago

Rahel-A commented 6 years ago

These features were added: The argument "-i[back-up-file-suffix]" If a suffix is provided, the original file will be copied to this backup file. No warning is given if backup file matches original file (something to consider?) Input file must be given with "-f[input-file]".

Issues encountered: Lazily reading the input file causes us to have file blocked when trying to write to it. ConsumeExtra in commonSeparators will consume expressions first and skip the input file, so "-f" is used to identify file. TODO:

gelisam commented 6 years ago

"-f" is used to identify file.

Please keep the original API, as discussed. It would be fine if the file could be specified either using -f or using a positional argument, but I don't want to remove the positional API unless there is a good reason to.

gelisam commented 6 years ago
  • Consider an alternative write method?
    • Write stdout to temporary file, and then overwrite the original
    • Would that still not have problems with lazy ready?

In your original email you said you wanted to "write to a temporary file, and in-case of success, overwrite the input file". This does sound like the right approach to me. Which "lazy read" problems are you anticipating?

gelisam commented 6 years ago

No warning is given if backup file matches original file (something to consider?)

The original file? This would only happen if the suffix is the empty string, do you still backup the file if that's the suffix which was given? Or did you mean that there is no warning if the original file plus the suffix gives an existing file? What is the behaviour of sed and gawk in that case?

Rahel-A commented 6 years ago

don't mix record syntax and sum types (e.g. by defining a separate record type, or by removing the record selectors and relying on comments to distinguish the two arguments)

fixed

don't use fromJust

fixed

when manipulating filenames, use FilePath instead of String

fixed

accept the file name as a positional argument (either removing the new --file argument or keeping both ways of specifying the file name)

I'm not sure how to use the parser to only consume the last part of the string for input file

write to a temporary file, and in-case of success, overwrite the input file revert to lazy bytestrings

Will update you on those two issues later today.

Rahel-A commented 6 years ago

OK, I believe I have gotten around the lazy read issue I was having. The input file is overwritten once a temporary file is completely written.

edit** This was my test case: cp test.txt.~ test.txt && hawk -i"~" -ad "sum . L.map toInt" -ftest.txt

cat test.txt.~
5
6
7
cat test.txt
18
gelisam commented 6 years ago

accept the file name as a positional argument (either removing the new --file argument or keeping both ways of specifying the file name)

I'm not sure how to use the parser to only consume the last part of the string for input file

Hmm, you're right, this is harder than it seems! I thought it would be as simple as using consumeExtra instead of consumeLast Option.InputFile, but I now realize that this would consume the expression instead of the filename.

The solution would be to consume the expression before the common options. Unfortunately, doing that is not obvious either because we only want to parse the expression (and complain if it's missing) when the main command is Option.Eval, Option.Apply, or Option.Map, not when the main command is Option.Help or Option.Version.

What I would do is to remove the commonOptions call and replace it with a higher-order function withExpr which could be called like this:

help    = return Help
version = return Version
eval    = withExpr $ \c e -> Eval  e <$>                 outputSpec c
apply   = withExpr $ \c e -> Apply e <$> inputSpec c <*> outputSpec c
map'    = withExpr $ \c e -> Map   e <$> inputSpec c <*> outputSpec c

This higher-order function would parse the expression before the common options so that when consuming those common options, the consumeExtra consumes the filename, not the expression.