Switch serialization format for variables and history

JoshCheek commented 8 years ago

fish version installed:

~/ref/tools/fish-shell/fish -v
fish, version 2.3.1-492-g3702616

OS/terminal used:

$ uname -a
Darwin Joshs-MacBook-Air-2.local 15.6.0 Darwin Kernel Version 15.6.0: Thu Jun 23 18:25:34 PDT 2016; root:xnu-3248.60.10~1/RELEASE_X86_64 x86_64

$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.11.6
BuildVersion:   15G31

Issue:

Variables do not get set when their value is exactly the character \x1d.

Reproduction steps

$ set -l oned \x1d
$ set -l onec \x1c

# We see that onec has a value, but oned does not
$ set -l
onec \x1c
oned

krader1961 commented 8 years ago

This is probably an unfortunate consequence of fish using \x1D internally and exposing that fact publicly. See

/// Value denoting a null string.
#define ENV_NULL L"\x1d"

in env.cpp. I suspect you are reporting this based on other open fish issues since it is very unlikely you noticed this in real world use of fish. If I'm wrong about that please provide more details about the scenario which caused you to notice this problem.

JoshCheek commented 8 years ago

Update: It looks like it is actually a bug with setting the variable rather than with looking it up and expanding it. I've updated the issue to reflect this.

JoshCheek commented 8 years ago

in env.cpp. I suspect you are reporting this based on other open fish issues since it is very unlikely you noticed this in real world use of fish. If I'm wrong about that please provide more details about the scenario which caused you to notice this problem.

-- @krader1961

While working on something unrelated, I realized I had a very hazy understanding of what was actually in my variables, and whether my experiments were actually testing the things I thought that they were. Many languages address this by providing a way to introspect:

# Ruby has `p` for a quick debugging print
$ ruby -e 'p ["abc", "def", "ghi"]'
["abc", "def", "ghi"]

# which is equivalent to `puts(variable.inspect)`
$ ruby -e 'puts ["abc", "def", "ghi"].inspect'
["abc", "def", "ghi"]

# Haskell has `show`
$ ghci -e 'putStrLn (show ["abc", "def", "ghi"])'
["abc","def","ghi"]

# Javascript has console.dir
$ node -e 'console.dir(["abc", "def", "ghi"])'
[ 'abc', 'def', 'ghi' ]

# Which is equivalent to `log(inspect(variable))`
$ node -e 'console.log(require("util").inspect(["abc", "def", "ghi"]))'
[ 'abc', 'def', 'ghi' ]

So, I was about to open an issue requesting an inspect function which would display the contents of variables in an unambiguous way.

But then I realized I've built tools like that before (1, 2) so I'm reasonably familiar with the problem / expectation, and I really ought to just try building it myself instead of continually expecting everyone else to do work for me.

So, I tried to build the function I was wanting (currently working well"ish", source is here). Here are the difficulties I've hit while working on it so far:

I want to highlight the output if it's printing to the terminal, but test -t 1 was not working. I ultimately got around it with command test -t 1
I had placed every possible ansi char into a case statement at one point, and it was getting caught up. Turns out the reason was because some chars are special in case statements (the question mark and asterisk), so I switched over to a smarter if statement.
For the first implementation, I was going to be lazy and iterate over every char, setting the highlighting for that char ($fish_color_quote vs $fish_color_escape), which is when I hit the issue being reported here. This was actually pretty difficult to isolate. I'm getting around it by piping it between programs instead of setting it into a variable.
The string builtin can't handle \x00, it truncates the string when it sees this byte. I'm just ignoring this at present.
I eventually found the strings builtin, which my brain wants to abstract to Ruby's "abc".chars # => ["a", "b", "c"], but it's not able to be used like that since it has no way to return an array of strings. It addresses this by printing each result on its own line. However that doesn't work when the result is a newline as the data matches the delimiter. Eg if I do string split "" a\nb It's going to print a\n\n\nb\n (try this one to see its difficulty being composed: string split '' a\nb | string escape). Personally, I think it should probably be able to pass this test. To get around that, I'm explicitly checking for the newline and consuming the next newline if I find it. Really, though, it seems like there should be something built in for this use case (maybe string chars --format '%d' $str, then I'd give it %p for my use case). A real solution would be for Unix to establish an application layer protocol to replace unspecified text, but sadly I don't see that happening.

faho commented 8 years ago

Eg if I do string split "" a\nb It's going to print a\n\n\nb\n (try this one to see its difficulty being composed: string split '' a\nb | string escape). Personally, I think it should probably be able to pass this test

Why use string split at all? I mean, the intention is

an inspect function which would display the contents of variables in an unambiguous way.

If you'd like to display e.g. $fish_color_quote, you can do string escape -- $fish_color_quote. The variable will be split along the elements (not newlines, though it's currently quite hard to get newlines into variables) and each element will be printed escaped in a single line (because newlines will be escaped to \n).

To make a function that just gets the variable name, you'd need to add the --no-scope-shadowing option to function so you could also handle local variables. (This has its own caveat - if you define a variable inside such a function, it will still shadow the outer definition)

Then you could do something like

function inspect --no-scope-shadowing
    for ___inspect_arg in $argv
        string escape -- $___inspect_arg $$___inspect_arg
    end
end
inspect fish_color_autosuggestion
set -l smkx (tput smkx)
inspect smkx

The string builtin can't handle \x00, it truncates the string when it sees this byte. I'm just ignoring this at present.

Well, under Unix, programs receive arguments as NUL-terminated strings, so you can't actually pass a NUL inside an argument. (AFAIK, anyway)

We could fake this for string (since it's a builtin), though you'd never be able to pass these to other commands. If we also added it for e.g. echo and printf, we'd be able to use it to pass NULs via pipes.

floam commented 8 years ago

since it has no way to return an array of strings.

You could have it escape them, then unescape (eval unfortunately) them later,

string split '' ab\nc | string escape

These days I'd probably prefer to use JSON or something between tools if we're talking about arrays and stuff. First fish would need dicts to imagine anything very cool here. Another way one might want to represent things as output might be: {'a','b',\n,'c'} ?

faho commented 8 years ago

Turns out the reason was because some chars are special in case statements (the question mark and asterisk)

Quoting is your friend. The "*" and "?" are interpreted as globs, like they are everwhere else.

Do case '*', not case *, because that will expand the "*" to the files in your current directory.

floam commented 8 years ago

I've wondered for while if such an 'inspect' behavior might be perhaps a better thing for fish to do than this error:

floam@rmbp ~> $foo
Variables may not be used as commands. In fish, please define a function or use 'eval $foo'.

I'm often just letting it complete $foo for me so I can snoop on what $foo is in the pager with better clarity than echo $foo and less work than typing echo first. I'll then, if I can't see it all given the cramped space, perhaps wishing I was in a python REPL, hit enter wanting to see the value. It seems like an opportunity other shells wouldn't have since we won't let people execute these as commands here.

JoshCheek commented 8 years ago

Why use string split at all? I mean, the intention is

-- @faho

I need to apply highlighting between characters, so my plan was to iterate over them 1 at a time. Iterating over the result of string split seemed like the easiest way to do that (prior to that I was trying string sub --start $i --length 1 $str. Here is an SS of what it currently does (obviously not quite right, but sufficient for me for the moment):

screenshot 2016-08-29 19 34 16

function inspect --no-scope-shadowing
    for ___inspect_arg in $argv
        string escape -- $___inspect_arg $$___inspect_arg
    end
end
inspect fish_color_autosuggestion
set -l smkx (tput smkx)
inspect smkx

-- @faho

Oh, that's really cool! I haven't used arrays in this way yet, but was thinking it should check the results and display like an associative array if they line up with other vars.

Well, under Unix, programs receive arguments as NUL-terminated strings, so you can't actually pass a NUL inside an argument. (AFAIK, anyway)

-- @faho

Oh, yeah, good point >.< I guess you'd have to pipe the names into the function.

You could have it escape them, then unescape (eval unfortunately) them later,

-- @floam

Aye, though you do still have to run it through sed (eg using sed, I can make my test pass for everything except \x1e)

These days I'd probably prefer to use JSON or something between tools if we're talking about arrays and stuff. First fish would need dicts to imagine anything very cool here. Another way one might want to represent things as output might be: {'a','b',\n,'c'} ?

-- @floam

Aye, that would be nice. Or if they had ability to set mime types, then could use whatever format is appropriate for their type of data (presumably tools like jq would become more popular for on-the-fly restructuring of input/output)

Quoting is your friend. The "*" and "?" are interpreted as globs, like they are everwhere else.

Do case '', not case *, because that will expand the "" to the files in your current directory.

-- @faho

Ahh, yeah, you're right, I did have one level of quoting, but needed two:

screenshot 2016-08-29 20 38 47

I'm often just letting it complete $foo for me so I can snoop on what $foo is in the pager with better clarity than echo $foo and less work than typing echo first. I'll then, if I can't see it all given the cramped space, perhaps wishing I was in a python REPL, hit enter wanting to see the value. It seems like an opportunity other shells wouldn't have since we won't let people execute these as commands here.

-- @floam

Oh, that's a nice trick! That's pretty much the same use case I was trying to address.

krader1961 commented 8 years ago

Note that the ENV_NULL symbol (value 0x1D) that is the cause of the original problem report should be a Unicode private use char in the range defined by RESERVED_CHAR_BASE and RESERVED_CHAR_END such as VARIABLE_EXPAND_EMPTY. I'll work on that. I also can't help but wonder if we should do the same thing for the ARRAY_SEP symbol (value 0x1E).

JoshCheek commented 8 years ago

I also can't help but wonder if we should do the same thing for the ARRAY_SEP symbol (value 0x1E).

-- @krader1961

I'd advocate it, I did hit one with that somewhere along the way. Didn't document the experiment that elicited the issue, though.

zanchey commented 7 years ago

See also #436.

floam commented 7 years ago

I need to apply highlighting between characters, so my plan was to iterate over them 1 at a time. Iterating over the result of string split seemed like the easiest way to do that (prior to that I was trying string sub --start $i --length 1 $str.

If you're OK with fish's syntax highlighting, I like to pipe through fish_indent --ansi. But you'd have to use something that wouldn't piss the parser off, so I put it after . to get it out of the command position (it'd all be the same color).

It has the benefit of using the actual parser - which wouldn't highlight \e here like yours did because it doesn't expand to an escape character if in single quotes - of course maybe for your goals you want to make your own anyway and after that the syntax coloring is easy.

krader1961 commented 7 years ago

Fixing this so that \x1D can be assigned to a var is going to be really hard. Why? Because that magic value is serialized to the universal var file and exported vars. If that magic value was just used internally it would be easy to change.

Consider this var in my _~/.config/fish/fishd.$macaddr file:

SET __fish_init_2_3_0:\x1d

That entry does not mean the var was set to the empty string; e.g., set -U __fish_init_2_3_0 "". It means the var has no associated values. That is, it was defined thusly: set -U __fish_init_2_3_0.

I have a proof of concept change that lets me assign \x1D to vars. This even works for universal or exported vars. However, it means existing vars with the current magic ENV_NULL value aren't correctly recognized. We'll probably have to accept, at least for a year or two, universal and exported vars that are set to \x1D to mean the var has "no values" while writing them using the new, private-use, value. This, however, introduces a different problem. If you define a universal var with a version of fish having this change in magic number those vars won't be correctly parsed by older versions of fish.

On the one hand I think fish should not be using any ASCII or valid Unicode char for strictly internal purposes. That includes Unicode private-use chars. The latter isn't going to be possible as long as we're using wide chars internally since MS Windows limits us to 16 bit wide chars. But we can, at least, restrict our magic chars to a small range of the Unicode private-use range. On the other hand fish isn't meant to be a general purpose language that can handle arbitrary character values. So do we really want to make this change? I'll make a pull-request with my transitional solution and we can then discuss the merits of merging it.

Obviously the above discussion applies to the magic \x1E value (symbol ARRAY_SEP) used to separate values for vars with more than one value. My proof of concept change won't handle that value but if people are comfortable changing ENV_NULL it will be straightforward to do the same transformation for ARRAY_SEP.

floam commented 7 years ago

Way more folks will be inconvenienced by changing the array separator to some private use character when they try to load up a previous version of fish than folks who want to use that one character. I'd like to think the day we changed the array separator coincided with us gaining something really worthwhile to show for it, like key:value dicts, or a way to actually "return an array". Hopefully there would be something nice this would enable that we'd introduce at the same time.

floam commented 7 years ago

Which is https://github.com/fish-shell/fish-shell/issues/436 (and https://github.com/fish-shell/fish-shell/issues/441 a little)

floam commented 7 years ago

@krader1961 I'm extremely curious: how does your experiment handle exporting? What does set -x exported foo bar; ksh , and then echo $exported there do?

The current behavior is certainly not really intuitive to use if you're not fish and are digesting the variables. The pre-2.2.0 automatic colon behavior was pretty much "as good as it gets" in that regard. We can't export a private use character - so that'd make us come up with something. Anyhow, what did you do?

floam commented 7 years ago

(I know that this is about NULL only, but you mentioned the same transformation you did would work for both situations, and that one is more interesting.)

faho commented 7 years ago

This, however, introduces a different problem. If you define a universal var with a version of fish having this change in magic number those vars won't be correctly parsed by older versions of fish.

Might it be worth it to rename the file (i.e. increase the file's version)?

Have a new fish read from fishd2 and try to import from fishd if no such file exists?

That still wouldn't enable sharing across multiple fish versions, but at least you wouldn't get variables that appear to be corrupted.

krader1961 commented 7 years ago

A bit of research reveals that 0xFFFF (and 0xFFFE) are classified as "not a character" by Unicode but are legal in unicode strings. In other words 0xFFFF is truly a private use character that should never be associated with a glyph or grapheme. So we can replace both the 0x1D (ENV_NULL) and 0x1E (ARRAY_SEP) chars with 0xFFFF. We can use the same char for both use cases because the former will never have any other characters (code points) in the string while the latter will never have it as the initial, let alone only, character in the string. This is very fortunate because it allows us to make this backward incompatible change just once while still being compatible with platforms, like MS Windows, where sizeof(wchar_t) == 2. This means we won't have to revisit this when we switch to using UTF-8 internally which affords us a much larger range of code points (which we need to eliminate the other unicode "private use" code points we reserve for internal use).

how does your experiment handle exporting?

Exactly as it does today, @floam. The only difference is the magic value in the UTF-8 encoded string will be \xEF\xBF\xBF (0xFFFF) rather than \x1D or \x1E. What I think you're asking is how will vars with more than one value be exported so that they're useful by other programs. That's a different problem that this issue won't address. It is being discussed in issue #436 as you noted.

Might it be worth it to rename the file (i.e. increase the file's version)?

I need to think about that. I wasn't going to automatically rewrite universal vars affected by my proposed change. So unless you create a new uvar that is null (i.e., has no value) it won't cause problems with older versions of fish. We will definitely need to do something like what you're proposing before we can remove all references to the legacy 0x1D and 0x1E magic values. And the only solution I too can think of is to introduce a new uvar file name (i.e., "version" of the name). Unless a better solution is identified we'll do that before closing this issue. But I'd like to do that work in a separate change. In no small part because it's problematic.

Consider running fish versions pre and post this change. In both of them a uvar is created or modified whose encoding is affected by this change. Obviously we can have the new, post change, fish write to both uvar files using the encoding appropriate for each so that the old fish version sees the change. But the old fish isn't going to update the new uvar file. So how do we merge updates to the old uvar format into the new uvar format? It seems to me this is inherently a one way change. If you run a fish with this proposed change any uvars it modifies won't be visible to the old fish. Similarly, any uvar changes made by the old fish won't be visible to the new fish after the new uvar fishd2 file is created.

faho commented 7 years ago

It seems to me this is inherently a one way change. If you run a fish with this proposed change any uvars it modifies won't be visible to the old fish. Similarly, any uvar changes made by the old fish won't be visible to the new fish after the new uvar fishd2 file is created.

Of course we're not going to have perfect compatibility (well, it would be possible to have a couple of fish versions that always synchronize both files and then switch over after a couple of years, but that's a bit much IMHO).

But consider the case where someone has problems with the new fish - maybe there's a bug, maybe the distro package is broken, maybe a third-party script does not yet work with it. If we used one file, switching back to the old fish would make it read "corrupt" variables. If we used two, downgrading would mean switching to either the old variable state if we just one-way imported the old file or keep the new variable state if we wrote both.

I don't want us to support running multiple fish versions side-by-side, but switching back should be okay.

floam commented 7 years ago

What if we just appended version identifiers to the end of the fishd files, always? I guess the idea would be that any particular version of fish might try to read and a convert a (recent) previous versions' fishd file - if and only if one with .$FISH_VERSION on the end doesn't exist yet. Old versions of the file just get left where they are as they were. Never even try to interpret a younger fish's data. I think it's OK that if when you revert to a previous version and it's like stepping into a time machine.

I don't want us to support running multiple fish versions side-by-side

I've been trying to switch back and forth a bit these last few months, mostly to test things, often older linux build - it's really painful. I think we should "support" running fishes side-by-side with other versions of fish. But don't expect them to integrate. 5 year old versions of fish might run side-by-side with fish git master, basically unaffected on most computers today, except that they are likely loading up functions and completions for a totally different version of fish from the future. If we can segregate this, it'd go a long way towards making fish more reliable. What we do with the OS X .app bundles works really nice.

There's still ~/.config and ~/.local, but that often will just have a prompt in it and the user will have to maintain that in a way matching their goals, at the end of the day.

faho commented 7 years ago

What if we just appended version identifiers to the end of the fishd files, always? I guess the idea would be that any particular version of fish might try to read and a convert a (recent) previous versions' fishd file - if and only if one with .$FISH_VERSION on the end doesn't exist yet.

Yes, though the idea is that after this there's no need to change it again. Even if it did, it would do so very rarely and we could then again add the code. Since each fish version will only read the files it knows about, there's nothing special to do to protect them from reading a newer file version.

Also, just to clarify, $FISH_VERSION is not useful for this as it changes with every single commit when doing a build from fish. Having hundreds of fishd files and every new build import again would be quite annoying.

I think we should "support" running fishes side-by-side with other versions of fish.

What more would need to be done to enable that? Old fishes could read their stuff, new fishes would read their stuff (importing the old when needed). UVars wouldn't necessarily be synchronized, but the old fish wouldn't be broken by this.

The biggest problem is going to be the user configuration - the other dirs in $fish_function_path et al should of course be set per-fish when you build it (in essence do what NixOS does).

floam commented 7 years ago

Actually, that was my suggestion: really use $FISH_VERSION so that 2.3.1 and 2.3.0 for example will not even try to share the content of a joint fishd file back and forth, aside from the 2.3.0 -> 2.3.1 migration. Because it's stuff like that that is likely to break scripts, and actually people don't need two-way stuff during upgrades.

The idea is that if you roll back to 2.2.0 right now you'll have the functions that came with 2.2.0 being autoloaded and whatever was set the last time you used it, it would still be set. The environment forks when you upgrade.

faho commented 7 years ago

Actually, that was my suggestion: really use $FISH_VERSION so that 2.3.1 and 2.3.0 for example will not even try to share the content of a joint fishd file back and forth, aside from the 2.3.0 -> 2.3.1 migration

And 2.3.1-501-gb895a50 vs 2.3.1-500-gXXXXXX?

This would mean going full NixOS.

floam commented 7 years ago

It'd be important that it matched whatever happened to the our scripts we include. I'd mimic Homebrew: for the versioned install directories in Cellar that get symlinked into place in /usr/local, git checkouts are always alongside 2.3.0, 2.3.1, in /usr/local/Cellar/fish/HEAD/, there's only one install location that's possible when you do a git build.

krader1961 commented 7 years ago

In issue #4200 I'm working to replace the flat string representation used internally for fish script arrays with an actual array structure (e.g., std::vector). For fish 3.0 we should consider switching to a different representation when serializing vars into the environment. JSON, XML, and google protobufs are the obvious choices.

Personally I'm not a fan of XML for this as it is overkill for this purpose and code to create and read the XML representation is considerably more complicated than for the other two representations. There are plenty of C++ JSON libraries (just google "c++ json"). The google protobuf project is here and a short tutorial is here. Protobufs are also overkill for this specific use case but are extremely efficient in both space and time and the flexibility could be extremely useful in the future.

Another option is to use a variation of our current serialization format. Specifically, we would need to provide away to escape \x1D and \x1E characters so that they can be used in var values and unambiguously decide whether they are part of a literal value or have their current magic meaning. It might even be possible to do this in a way that allows fish 3.0 to read vars using the current serialization format with little risk of misinterpretation.

faho commented 7 years ago

For fish 3.0 we should consider switching to a different representation when serializing vars into the environment for fish 3.0. JSON, XML, and google protobufs are the obvious choices.

All of these kinda seem like overkill. However, do note that we already use yaml for our history files. It might be possible to repurpose that.

krader1961 commented 7 years ago

I really dislike YAML. And we use our own implementation that has known bugs rather than a high quality library. Too, we can't have literal newlines in the serialized format. Which means escaping the newlines in the YAML format which negates the primary reason to use it.

Having slept on this I'm inclined to use a variant of our current serialization format. The idea is to prefix \x1D and \x1E with \x1B (escape) to remove the special meaning of those symbols. If an escape character appears in a string it is also prefixed with an escape character. Since the probability that anyone has a uvar with consecutive escape characters is very close to zero this allows us to read vars serialized using the current encoding.

Having said that I'd still love to see us use JSON or protobufs. But that can probably only be justified if we decide to change the history file format.

faho commented 7 years ago

Since the probability that anyone has a uvar with consecutive escape characters is very close to zero this allows us to read vars serialized using the current encoding.

That's rather clever.

Note that we also have #1257 (which asks for the uvar file to be moved out of ~/.config) and #1912 (which asks for it to not be machine-specific - which also requires a renaming). Doing all three at the same time is probably easier - try to get the new file, if not, try to find the old file. If you've found that, deserialize (with the old semantics) and save it (with the new ones) in the new place.

floam commented 7 years ago

I like JSON for this. I recall noticing that the current ksh 'beta' branch on AT&T's ast GitHub project can read/print data as JSON and thinking that was cool.

On Sat, Jul 8, 2017 at 1:54 PM Kurtis Rader notifications@github.com wrote:

In issue #4200 https://github.com/fish-shell/fish-shell/issues/4200 I'm working to replace the flat string representation used internally for fish script arrays with an actual array structure (e.g., std::vector). For fish 3.0 we should consider switching to a different representation when serializing vars into the environment for fish 3.0. JSON, XML, and google protobufs are the obvious choices.

Personally I'm not a fan of XML for this as it is overkill for this purpose and code to create and read the XML representation is considerably more complicated than for the other two representations. There are plenty of C++ JSON libraries (just google "c++ json"). The google protobuf project is here https://github.com/google/protobuf and a short tutorial is here https://developers.google.com/protocol-buffers/docs/cpptutorial. Protobufs are also overkill for this specific use case but are extremely efficient in both space and time and the flexibility could be extremely useful in the future.

Another option is to use a variation of our current serialization format. Specifically, we would need to provide away to escape \x1D and \x1E characters so that they can be used in var values and unambiguously decide whether they are part of a literal value or have their current magic meaning. It might even be possible to do this in a way that allows fish 3.0 to read vars using the current serialization format with little risk of misinterpretation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fish-shell/fish-shell/issues/3341#issuecomment-313880497, or mute the thread https://github.com/notifications/unsubscribe-auth/AARxRuBEsIoNy3uLoGrZ7T9UlwFKOiqkks5sL-yLgaJpZM4JvJBC .

krader1961 commented 7 years ago

Using JSON for serializing exported and universal vars (if not also for our history) might make it worthwhile to also make that format usable as a first class citizen in fish in a manner like the latest Korn shell version (not to mention Javascript). Probably via a new string subcommand. So I kind of like @floam's recommendation to use it rather than protobufs.

krader1961 commented 7 years ago

@floam, Can you clarify how you see JSON being useful in the context of fish script? I've thought about this some more and it isn't really useful unless we implement compound variables. That is, the ability to nest vars inside vars. Something I don't see happening any time soon. Nor can we use it for communicating directly with the fish_config web UI even if we were to implement compound variables.

So while I still want to see us switch to some other format for fish 3.0 I don't see a clear argument for JSON over google protobufs. I'm slightly biased as a former Google software engineer but protobufs is the superior encoding. Note that google protobufs normally use a binary encoding for storage and RPC but also has a robust, easy to read, text format. And by "easy to read" I mean that it is no harder to read, or write by hand, than JSON.

faho commented 7 years ago

Can you clarify how you see JSON being useful in the context of fish script?

Well, I at least know of a couple of tools to handle json on the commandline. The most popular is probably jq, I like jshon.

That could be how string subcommands could work - string jsonify var would print a variable as json, string read-json expression jsonstring would get some subset of jsonstring.

I'm not sure that is a business we want to get into, but it would be useful without needing compound variables.

So while I still want to see us switch to some other format for fish 3.0 I don't see a clear argument for JSON over google protobufs

So... what are the arguments here?

I mean

protobufs is the superior encoding

sounds nice, but what does that mean?

Like I said, there's tooling around json, is there something similar for protobufs?

Which is easier to implement? Do we need to use a library (which makes it harder to build fish yourself)? What are the available libraries (licensing, ease of use, availability)?

Are there performance concerns i.e. with a common variable set size, does this add or reduce overhead compared to the existing format? How about history?

krader1961 commented 7 years ago

Protobufs offer a space and time efficient binary representation where performance is critical. Where performance is less important it offers a text representation that is superficially similar to JSON. A protobuf schema allows you to define attributes like the field type, a default value and whether the field can be repeated. This is like our new argparse command in that it moves a lot of boilerplate code for doing things like validating that a protobuf is valid into the protobuf library where it belongs. The core JSON specification doesn't have schemas but there is third-party support. The first couple of JSON libraries I looked at for possible use by fish did not have support for schemas.

The primary tooling for protobufs is the open source project from Google. The protobuf license is compatible with fish AFAICT.

Do we need to use a library...

Yes, regardless of whether we choose JSON or protobufs we should pick a high quality library. We should not roll our own implementation like we did for the history file pseudo-YAML format.

If you google "google protobuf vs json" you'll find numerous articles such as these:

http://blog.codeclimate.com/blog/2014/06/05/choose-protocol-buffers/ https://auth0.com/blog/beating-json-performance-with-protobuf/ https://webapplog.com/json-is-not-cool-anymore/

I looked at jq and jshon and shuddered. If we do implement fish script support for either JSON or protobufs we can do a lot better than either of those. But at the moment I'm only considering these encodings for serializing variables to the environment or uvar storage and the history file format.

floam commented 7 years ago

Indeed JSON would be a lot cooler if we had compound variables. But even without them, to me the big advantage of JSON vs protobufs is that we can exchange lists with other tools more easily and it's human readable rather than binary.

Also, to me, it makes more sense to not use string read-json but for example add a --json option to read.

Probably set could have a similar option, returning a JSON map.

floam commented 7 years ago

Where performance is less important it offers a text representation that is superficially similar to JSON.

I didn't know this.

krader1961 commented 7 years ago

the big advantage of JSON vs protobufs is that we can exchange lists with other tools more easily

I think that's debatable. There is first-class support for protobufs in C++, Java, Python, Go, Ruby, Node.js and others. See the three articles I linked to and you'll find plenty more.

And note that we're not talking about supporting either in fish script. I haven't seen a good use case. Let alone two or three that would justify implementing that support. At this stage I'm focused solely on a single serialization format for replacing our two, adhoc, mechanisms for command history and storing vars.

krader1961 commented 7 years ago

I just learned that protobuf version 3 (aka proto3) has native support for JSON. I didn't know that because when I left Google four years ago proto3 didn't exist and thus the projects I worked on used proto2.

This means that should we decide to implement a string subcommand for handling JSON we can readily do so using just the high quality google protobuf library. Given that I don't see any reason to use JSON internally. Especially since protobufs make it easier to do things like augment the history file format in a backward compatible manner without requiring rewriting the history file. So unless someone sees a problem with the licensing or other aspects of the protobuf project I think it is the clear winner.

krader1961 commented 7 years ago

Also, this discussion thread about proto3 support for JSON: https://news.ycombinator.com/item?id=9666213

faho commented 7 years ago

The protobuf license is compatible with fish AFAICT.

It's what the FSF calls a "Modified BSD license". Yes, it's compatible with our GPLv2.

krader1961 commented 7 years ago

Note that variables which consist of a single element need to be exported as a simple string. However, such variables written to our universal variable file should be encoded as a google protobuf.

ridiculousfish commented 6 years ago

One user's thoughts on the history format is at https://news.ycombinator.com/item?id=15911598

floam commented 4 years ago

Happened upon this on Hacker News: https://news.ycombinator.com/item?id=20732197

fish-shell / fish-shell

Switch serialization format for variables and history #3341

Reproduction steps