Update `secrets env` to omit single trailing `\n` from files which automatically include it

tegefaulkes commented 1 month ago

Specification

Currently, most Unix files, and by extension, all files inside Polykey's vault will have a trailing \n. This is the standard way to end a file in Unix (see this question). This poses an issue, as all secrets will also have this newline when it might not be intentional, breaking applications (see additional context below).

A long-term solution would require a vault schema to dictate the formatting of the secrets (see Polykey#222). However, that is still some time away, and this issue needs to be solved before that can be merged. To do this, we can implement a flag which would explicitly tell the command to preserve the single trailing \n at the end of the file, otherwise the default behaviour would be to exclude it.

[aryanj@matrix-34xx:~]$ polykey secrets env vault:/ABC -el vault2:/ABC -el vault3:ABC -el -- command arg1

Keep in mind that only a single, trailing newline must be removed. Trailing white space must be preserved, and any trailing white space but the last one should also be preserved, unless the user explicitly specifies that the newline can be kept.

While we technically can use polykey secrets write to write without entering the newline, it is not ideal, as a vault is a file system, and in most cases, users will paste their secrets using an editor, which would add the newline at the end. This is why we need this to work right now.

Additional context

Line break is bad. I have an example, things like passwords to Postgres databases don't like newlines, and in fact if you use them, it will just sometimes silently fail the authentication.

So you have to be careful about whether the secret string actually has a newline or not.

This has implications on things like secrets write command. So for example, the standard EOF character is \n for saved files on disk. That's by convention. However if you are saving a password into a vault, and we store it as a file, then that password may not have a \n at the end of the string. Leaving a \n as convention then becomes ambiguous.

Suppose you have something like ./DB_PASSWORD and inside is helloworld\n (due to \n conventions and even most commands like echo helloworld > ./DB_PASSWORD, by default this will usually write a newline along with it). To get it without newline you do echo -n helloworld >./DB_PASSWORD. On my terminal that shows up as helloworld% after cat

IMO, we need to be quite strict about this... which is that whatever comes in via stdin should be preserved as-is, and to not add newlines into files that isn't supposed to have it.

The problem comes from implicit newlines. For example, polykey secrets ed ./DB_PASSWORD. If this opens up vim, it's likely vim will save the newline EOF - depending on the configuration of vim. Then later with polykey secrets cat ./DB_PASSWORD, this will produce everything including the newline. If this is injected using polykey secrets env, then you may have a newline in the output like DB_PASSWORD='helloworld\n', which may be inaccurate.

I think this problem fundamentally derives from crossing 2 abstractions - the file system abstraction and the key-value string abstraction

File systems/files usually use newlines to indicate end of a line and end of a file, many tools operating on files will naturally automate this. But many secrets are key value strings, and therefore don't assume newlines would exist. Like for example when you hit ENTER on the keyboard in a text input, it's a newline, but when you hit ENTER for a text field input, it means save it/input it.

So we have this situation whether in one context, vaults are file systems, but in another context they are key-value strings

We need to figure out an elegant way of dealing with the cross over. Right now we can use polykey secrets write to create the secret without the newline. Later this would be solved by a vaults schema.

For vaults schema to solve it, it would need to have trim constraints. But, I think we need to document the kind of workflows involving compositions with various CLI commands where this can be an issue.

- Conversation between @cmcdragonkai and @tegefaulkes

Discussion about vault schema in Polykey#222
vim adding newlines by default at the end of files in this Stackoverflow question.

Tasks

Automatically trim single trailing newline from all secrets
Add an option to disable this behaviour by default (ideally this should apply to each secret path rather than all the provided secrets)

linear[bot] commented 1 month ago

ENG-416 Review and improve `secrets env` command design

tegefaulkes commented 1 month ago

We need to modify the command to allow for just specifiying the vaultname if we want to specify the root path. So currently you can do vaultName:., vaultname:./ to specify root. but we want to support just specifying vaultname.

CMCDragonkai commented 1 month ago

The secrets env also has segfaults. I think it occurs during CTRL+C.

CMCDragonkai commented 2 weeks ago

Line break is bad, I have an example, things like passwords to postgres databases don't like newlines, and in fact if you use them, it will just sometimes silently fail the authentication
So you have to be careful about whether the secret string actually has a newline or not
This has implicatinos on things like secrets write command
@Aryan so for example
The standard EOF character - is \n for saved files on disk
That's by convention
However if you are saving a password into a vault
And we store it as a file
Then that password may not have a \n at the end of thes tring 
Leaving a \n as convention then becomes ambiguous
Suppose you have something like ./DB_PASSWORD and inside is helloworld\n (due to \n conventions and even most commands like echo helloworld > ./DB_PASSWORD, by default this will usually write a newline along with it
To get it without newline you do echo -n helloworld >./DB_PASSWORD
On my terminal that shows up as helloworld% after cat
IMO, we need to be quite strict about this... which is that whatever comes in via STDIN should be preserved as is, and to not add newlines into files that isn't supposed to have it.
The problem comes from implicit newlines like for example
secrets ed ./DB_PASSWORD
If this opens up vim, it's likely vim will save the newline EOF - depending on the configuration of vim
Then later with secrets cat ./DB_PASSWORD this will produce everything including the newline
If this is injected using secrets env then you may have a newline in the output like DB_PASSWORD='helloworld\n'
Which may be inaccurate
I think this problem fundamentally derives from crossing 2 abstractions - the filesystem abstraction and the key-value string abstraction
Filesystems/files usually use newlines to indicate end of a line and end of a file, many tools operating on files will naturally automate this. But many secrets are key value strings, and therefore don't assume newlines would exist. Like for example when you hit ENTER on the keyboard in a textform input, it's a newline, but when you hit ENTER for a text field input, it means save it/input it.
So we have this situation whether in one context, vaults are filesystems, but in another context they are key-value strings
We need to figure out an elegant way of dealing with the cross over
Brian — 10/11/2024 8:06 PM
I went over it with her. Right now we can use secrets write to create the secret without the newline. Later this would be solved by a vaults schema.
Roger Qiu — 10/11/2024 8:07 PM
for vaults schema to solve it, it would need to have trim constraints
But I think we need to document the kind of workflows involving compositions with various CLI commands where this can be an issue.
Is there a set of existing issues related to this problem that I can info dump into it?

CMCDragonkai commented 2 weeks ago

@tegefaulkes says in the short term use secrets write, but in the long term it can only be solved with https://github.com/MatrixAI/Polykey/issues/222.

Well I think it's actually a big problem for UX. The point is that we are crossing contexts between "file editing" which has a convention of having \n to mean both EOL and EOF for text content. (For binary content, the \n being EOF is not standard, in fact mostly won't have it). This is why most text editors will write the \n at the end. Although this may be controlled by things like .editorconfig using the insert_final_newline. https://stackoverflow.com/a/31426524

The problem is that if the "file" is supposed to represent an implementation of key-value structure, then the invisible and implicit \n can be a common source of problems.

How should we solve this problem?

CMCDragonkai commented 2 weeks ago

Suppose you rely on a schema https://github.com/MatrixAI/Polykey/issues/222. This would mean a schema adds a trim constraint, that basically is applied on ingress. You trim any whitespace before it saves the file.

Alternatively on context-aware usage, we might then do a trim on the egress when using the env. It's unlikely env variables are going to have leading OR trailing whitespace. In such a scenario, it would also make sense to apply automatic trim on leading and trailing whitespace anyway...

One might add an option specifically not to apply automatic trim for the secrets env command.

CMCDragonkai commented 2 weeks ago

I think we need a do a few things here:

Vaults schema can apply a ingress trim when applicable.
But also because secrets env makes sense not to have \n at the end, cause that's likely a mistake, you'd want to apply an automatic removal of a trailing \n if it exists.
But because we are doing this "automatically", we must have an option to turn that off. Can we apply such an option to individual entries rather than all the env variables on the CLI?
It also makes sense that we should be able to compose together the env from different vaults... etc and different specific variables. And in those cases extra options/flags applied to individual entries is useful.

CMCDragonkai commented 2 weeks ago

Relying on short term write is not a good idea cause interactively you have to do ABC[CTRL+D][ENTER]. Nobody knows about this except advanced unix shellers.

So I think we need to do the above 4 things to make this work well.

Right now my vim even refuses to save the file without a newline.

CMCDragonkai commented 2 weeks ago

I suggest things like secrets env vault:/ABC -el vault2:/ABC -el vault3:/ABC -el -- thecommand.

Or something to help not add trim the trailing newline.

Also I want to emphasize this should be automatic trim of 1 SINGLE trailing newline. If should not remove all whitespace characters only the final \n if it exists. It should not remove \n\n.... etc.

Leading whitespace and trailing whitespace might still be significant.

Also I would argue that if we detect non-text values, we might want to not do that trim of the trailing newline. It would have to be a sort of smart heuristic.

CMCDragonkai commented 2 weeks ago

This isn't really a proper development issue template, @tegefaulkes @aryanjassal please incorporate the ideas from https://github.com/MatrixAI/Polykey-CLI/issues/296#issuecomment-2448870044 into the OP spec.

aryanjassal commented 2 weeks ago

This isn't really a proper development issue template, @tegefaulkes @aryanjassal please incorporate the ideas from #296 (comment) into the OP spec.

This issue is meant to be a discussion and doesn't follow the template or provide a specification. I will make a separate issue with the specs as discussed here.

Once the discussion is finished and we have a good idea where to take secrets env from here, then this issue can be closed.

aryanjassal commented 2 weeks ago

As requested, I have updated this issue itself instead of making a new issue and treating this one as a discussion issue.

CMCDragonkai commented 2 weeks ago

While working on zeta house, I hit the problem of thinking about how the vault schema works. After going through several iterations...

I think we would want to create a new standard in the community called env.schema.json. It doesn't actually matter what it is called, but that's a good name anyway.

Then:

{
  "type": "object",
  "properties": {
    "ZETA_HOUSE_ENV": {
      "type": "string",
      "enum": ["development", "production"],
      "default": "development"
    },
    "ZETA_HOUSE_GOOGLE_MAPS_API_KEY": {
      "type": "string"
    }
  },
  "required": [
    "ZETA_HOUSE_GOOGLE_MAPS_API_KEY"
  ]
}

This is an example of a JSON schema that matches what a .env.example would show, however it is far more easier to machine parse, since it's json, and we can also know exactly its structure since it has to be a json schema. It's also cross platform (will work on windows) unlike .env.example which only works on bash shells because CMD and Powershell has different languages.

So what do you do with this? This becomes standardised "type" specification as well as documentation about the "free variables" within an application. By doing this they become explicit. Well they are only explicit if the programming tools perform automatic checks during development/compilation/production. No automatic checks would mean they are no different from just README.md.

However in this case, that would mean we can then integrate this... I'm going to call this a "environment schema" (a subschema of "egress schema") into the polykey secrets env command.

You can then do:

. <(polykey secrets env --schema env.schema.json)

Now what does this do? Well this would be an error because right now there's no vaults. It's an egress schema that Polykey can interpret and apply a check to the final output.

This allows one to compose a variety of vault paths together to produce the required egress environment.

. <(polykey secrets env vault1 vault2:/file1 vault3:/dir --schema env.schema.json

Note that this is a egress schema, NOT a vault schema. It's applied to the entire thing. It's also an open-world schema, meaning any key values produced that is not part of the schema is allowed. We can also make use of the required and pattern options.

We have a json schema validator built into Polykey core library already, so it wouldn't be too difficult to make use of this.

This reveals another thing to understand. The vault and secret paths here are name/aliases. The vault path itself or just vaultX or vaultX:/dirpath is actually a reference that can be swapped based on context.

These are not "content addressed paths" like they are in other scenarios like Nix and stuff. Because they are not a globally unique thing.

They are more like an alias, and the value that they point to changes depending on who's asking. Who's asking is the identity by which authority is delegated to.

Thus they cannot be "hard paths" like content addresses, but they are aliases to be fulfilled by a dynamic fine-grained access control context.

In practice this means, the CI's zeta.house is not necessarily the same vault as my own local agent's zeta.house, and that's not necessarily the same as @shafiqihtsham local agent's zeta.house. We may all alias it zeta.house, but the actual content of the vault can be different and this is feature not a bug, as it allows dynamic fine-grained access control context. This is why the schema is important, because without the schema, it would get confusing pretty quickly what the expected content of a vault in relation to the application requirements would be.

MatrixAI / Polykey-CLI