Closed bradenbest closed 3 years ago
Thanks for the tips on c programming style. I will leave the program as written because it is in the repo as historical documentation. It is how I performed a one-time conversion of 0263 to something equally as unlikely in the corpus but acceptable to downstream processes that were much more picky that the version of perl where I had used it for ad-hoc serialization of key-value pairs.
seems weird to mix historical/dead code with regular code in a repo named "remodeling", but ok. I'm deleting the fork, so I'll leave the contents of json.c from my branch here, in case you change your mind.
#include <stdio.h>
static int const MAGIC_GS = 0xb3;
int main(void)
{
int ch;
while ( (ch = getchar()) != EOF) {
if (ch == MAGIC_GS) {
fputs("<<<<gs>>>>", stdout);
} else {
putchar(ch);
}
}
}
edit: oh good, github still preserves it.
Thanks.
Removed pointless "optimization" wherein the code attempted to avoid including stdio.h. I'm not sure what the original author thought they were getting out of that, but whatever the reasoning was, it was misguided, and also resulted in incorrect parsing, as NUL (0x00) is a valid string character in json (javascript strings are encoded in UTF-16 and are non-zero-terminated). EOF is a special negative value that getc related functions return when the file stream has no more characters (which is why
ch
has to be an int). Other than the newly added ability to read input that contains NUL, the behavior is unchanged and standard-compliant.The only reason the old behavior
(...) > 0
should be preserved is if NUL is being fed through stdin with the specific intent of marking the end of the file, which seems highly unlikely to me.On that note, regarding json strings, you might actually want to encode the input/output as UTF-16, assuming you want compliance with the ECMAScript standard. You might want this if the json text is being received by a javascript engine, for example, as otherwise, codepoints above U+00FF wouldn't transmit correctly. If the json is only used internally, though, then this isn't a big concern and you'd be fine sticking to UTF-8 for language support.
I'm unsure what the magic constant
0263
(B3 hex / 179 decimal) is supposed to represent, because there is no documentation (that I can find), comment or identifier explaining why it was chosen or what it means, so I named it MAGIC_GS, after the "gs" in the emitted string. Please consider changing the name of the constant to something more descriptive. Bear in mind that 179 is outside of ASCII range. In Unicode terms, it's the codepoint U+00B3 "Superscript Three" from the Latin-1 Supplement charset.Honestly, I'm not even sure how this is supposed to work. It certainly isn't parsing any json, unless this program's output is being piped to a second program that actually does the parsing, in which case this program should probably be renamed to something like "json-sanitize.c"
-Braden