Open CrendKing opened 2 weeks ago
It is interesting that you are using an HTTP response as your example, In HTTP protocol you can have multiple entries for the same header which would be incompatible with putting in a hash-table because you can't have duplicate keys.
(Invoke-WebRequest -Uri https://github.com/favicon.ico -Method HEAD).Headers
Key Value
--- -----
Server {GitHub.com}
Date {Mon, 13 May 2024 01:49:48 GMT}
ETag {W/"664165de-1976"}
Cache-Control {max-age=315360000}
Vary {Accept-Encoding, Accept, X-Requested-With}
X-Frame-Options {DENY}
Set-Cookie {_gh_sess=A3nCvWqXhb%2BiY8%2FYndCFdikMaBkl%2B6w1qj33QJ5f6nbd1q2dwHKyzjr7KXGsLjQZvC13xyHbpptRSW38GAb%2BcUfHCxsEBHUUSqSjxcBhgEDuYCO8XsQqDmJdtg%2BNaypPv…
Accept-Ranges {bytes}
X-GitHub-Request-Id {E8AC:A2755:4F712F9:5686B67:66417331}
Content-Type {image/x-icon}
Last-Modified {Mon, 13 May 2024 00:59:10 GMT}
Expires {Thu, 11 May 2034 01:49:48 GMT}
Content-Length {6518}
The http response effectively treats header values as dictionary of list of strings.
It is just an example. The point is that there are data that mostly conforms to the <key> <delimiter> <value>
format with slight deviations, and I want ConvertFrom-StringData
to be able to return the "mostly conformant" parts instead of throwing outright.
@CrendKing:
Pragmatically speaking, you could simply filter out lines that aren't of interest along the lines of:
curl --silent --head https://www.example.com |
Select-String -Raw : |
ConvertFrom-StringData -Delimiter :
However, note that the above creates a hashtable for each and every line (of interest) output by curl
, and that, in general, stdout output from external (native) programs is streamed line by line to PowerShell's pipeline.
As a result, the above outputs multiple, single-entry hashtables rather than a single hashtable.
To create a single hashtable, you'd have to provide all input lines as a single string, which in the simplest case means piping to Out-String
:
curl --silent --head https://www.example.com |
Select-String -Raw : |
Out-String |
ConvertFrom-StringData -Delimiter :
However, duplicate keys - which, as @rhubarb-geek-nz points out, are a possibility in HTTP headers - then would present a problem - justifiably.
However, you did uncover what to me appears to be a genuine bug (since reported in #23793):
With multiple, individual strings as input, ConvertFrom-StringData
mistakenly aborts processing overall, via a statement-terminating error, once an incorrectly formatted string is encountered; the expected behavior would be to merely emit a non-terminating error in that event, and continue processing further inputs:
'foo=bar', 'broken', 'bar=baz' | ConvertFrom-StringData
Currently, the output is:
Name Value
---- -----
foo bar
ConvertFrom-StringData: Data line 'broken' is not in 'name=value' format.
That is, overall processing was unexpectedly aborted when the broken
input string was encountered.
The output should be:
Name Value
---- -----
foo bar
ConvertFrom-StringData: Data line 'broken' is not in 'name=value' format.
bar baz
That is, processing should have continued after the incorrectly formatted input string (resulting in two hashtables overall).
Also, 2>$null
or -ErrorAction SilentlyContinue
/ -ErrorAction Ignore
should be effective in silencing / ignoring any input-string-specific error messages, which currently does not work, because the error is unexpectedly statement-terminating rather than non-terminating.
Thanks. Please be understanding that I only used curl and HTTP as an example because it is common. If it is causing confusion, I can replace the example with things like output from ffmpeg/ffprobe. Just let me know.
The 'foo=bar', 'broken', 'bar=baz' | ConvertFrom-StringData
case is different to my one string use case, but the root cause seems to be the same. If some -ErrorAction
would make it continue, that'd be great.
Pragmatically speaking, you could simply filter out lines that aren't of interest along the lines of:
Yes, I could. But why encourage the users to come up with case-by-case filtering that not only spend CPU time to take out data we don't need to begin with at all, while the proposed update to ConvertFrom-StringData
achieves exact the same without sacrifice anything? What is the benefit of the current ConvertFrom-StringData
behavior?
What is the benefit of the current
ConvertFrom-StringData
behavior?
Strictly enforcing data formats improves data quality throughout a system. If you have a relaxed system that accepts anything but ignores it all then you will have trouble later on finding out why none of your values made it though the system. A good recent example is ConvertFrom-JSON which previously accepted, without error, data that basically was not JSON, it allowed C++ comments, single quotes instead of double quotes etc, So while your parser may interpret the data you would have rude shock when you pass the data to a 3rd party RESTful service and it rejects it outright as not being JSON.
Stick to the "do one thing well" and have ConvertFrom-StringData do the name value conversions and leave filtering to other components that are better suited and can be customised.
I agree the default behavior should be strict. I'm talking about adding an option or -ErrorAction
to relax it. It is a conscious action from user to use that option, they already know the potential of incomplete conversion.
Another example. Invoke-WebRequest
by default throws exception if the response HTTP code is error (400-500). That leaves no room to check the header or content of the response. Fortunately there is -SkipHttpErrorCheck
. And when we use that option, we know we should always check for the HTTP code. Imagine if such option never existed.
@CrendKing, note that even if the true bug gets fixed (which I've since reported separately, in #23793), it won't help your use case:
While you will then be able to use, say, -ErrorAction Ignore
to ignore the errors from individual input strings, you'll still get a separate, single-entry output hashtable for each (well-formed) input string.
However, if you pass a single, multi-line string (e.g. curl --silent --head https://www.example.com | Out-String
) as the input, that string as a whole will result in a non-terminating error, and no hashtable will be constructed.
So, to spell it out, what you're asking for would require modifying ConvertFrom-StringData
to emit non-terminating errors for individual lines in a given, single, multi-line string and to still emit a hashtable if at least one well-formed line was found in the input string (I now see that you're already pointing to the relevant source-code location in your original post).
On a related note, the current behavior of emitting a separate hashtable for each input string is awkward, especially when piping an external program's output directly to ConvertFrom-StringData
, which happens line by line (as in your example; with input from a file, Get-Content -Raw
is the solution).
It would be much more useful to collect all input strings first, and then emit a single hashtable for all of them (the way ConvertFrom-Json
does, for instance).
Both changes discussed are technically breaking ones, though the former is more likely to fall into bucket 3, whereas I think the latter, if it gets implemented, would require opt-in in the form of a new parameter to avoid breaking existing code.
Thanks. Like you said, my original request is to make the function return non-empty Hashtable from single string input when some line can't be converted. I even showed a workaround that break the single string input into lines, try-catch convert each line and merge all single-entry Hashtables into one. It is doable but very awkward.
Also, if this new behavior is protected by a NEW OPTION, for instance ConvertFrom-StringData -SkipErrorLines
, I don't think it is backward incompatible change. On the other hand, make ConvertFrom-StringData -ErrorAction Ignore
ignore lines or input entry is backward incompatible, even though I believe the current behavior is strict up wrong, since it is not ignoring error at all.
ConvertFrom-StringData -ErrorAction Ignore
ignore lines or input entry is backward incompatible
Well, it is backward-incompatible in the sense that any bug fix is technically a breaking change.
The symptom is a side effect of the cmdlet (in effect) creating a statement-terminating error, to which -ErrorAction
by (current) design does not apply (see #14819); fixing #23793 will make this symptom go away.
That said, even though it would clearly be a bug fix, it has the potential to break existing code, given that using a try
statement would no longer be effective for catching an error - unless -ErrorAction Stop
is also used.
a NEW OPTION
Good point, introducing a new switch is a way to avoid backward-compatibility problems, but perhaps the need for that can be avoided if the suggested change is deemed to be a bucket-3 change - but someone would have to do some digging (e.g. analyzing code here on GitHub) to ascertain that.
That said, if fixing #23793 is decided against, a new switch is unavoidable.
I even showed a workaround that break the single string input into lines
Yes, though note that your workaround wouldn't work with curl --silent --head https://www.example.com
as input, as each output line would be received separately in the process
block; I know it's just a sample command, but the point is that it the problem applies to any external-program call.
To avoid a need for Out-String
or similar, you'd have to collect all input strings in the process
block (e.g. by building up a multiline string), and process the collected lines in the end
block.
This relates to additional change I suggested: to make ConvertFrom-StringData
perform this collecting itself, though that most likely requires a new switch.
Summary of the new feature / enhancement
Currently
ConvertFrom-StringData
immediately throws exception if it encounters an invalid data line or duplicate key. Is it a good idea to add an option so thatConvertFrom-StringData
would ignore and skip the erroneous line and continue? This is different to something like-ErrorAction SilentlyContinue
, which skips the whole command.For example, if we use curl to send a HEAD request, we get headers back, which is in the ": " form. However, it also returns the HTTP response code line at the top:
Piping it to
ConvertFrom-StringData -Delimiter ':'
results "ConvertFrom-StringData: Data line 'HTTP/2 200' is not in 'name=value' format." Of course we canSelect-Object -Skip 1
to workaround in this case, but such workaround is not always possible (e.g. invalid data appear in an undetermined line).Implementation wise, just use the option to continue instead of throw at https://github.com/PowerShell/PowerShell/blob/0f3fd2077cfd63b298ce7f88babe041f4dc6849e/src/Microsoft.PowerShell.Commands.Utility/commands/utility/ConvertFrom-StringData.cs#L68 and https://github.com/PowerShell/PowerShell/blob/0f3fd2077cfd63b298ce7f88babe041f4dc6849e/src/Microsoft.PowerShell.Commands.Utility/commands/utility/ConvertFrom-StringData.cs#L78.
Proposed technical implementation details (optional)
One can write a function to simulate the feature:
Output: