Externalise storage of graph credentials

m29827 commented 4 years ago

Accumulo passwords are generated when the add_graph lambda is invoked, it would be better to create and store these when the stack is provisioned and change the add_graph lambda to retrieve the credentials when required.

m29827 commented 4 years ago

AWS Secrets Manager charges:

$0.40 per secret per month
$0.05 per 10,000 API calls

d47853 commented 4 years ago

Have been thinking about this recently. The API calls should be fairly limited. The passwords / sensitive files are stored as Kubernetes secrets anyway so storing the secrets twice seems inefficient. It would also mean that graphs re-use the same passwords etc which seems like the wrong way to do it.

Perhaps a better way go about improving this is to improve the encryption mechanism we use to better protect the secrets at rest? https://aws.amazon.com/blogs/containers/using-eks-encryption-provider-support-for-defense-in-depth

What are your thoughts on this?

m29827 commented 4 years ago

@d47853 I agree that re-using passwords for graphs is probably not the best. I'm now wondering if there is a more deep rooted problem in the way helm uninstalls the gaffer chart which is preventing the successful round trip add-delete-add of a graph of name x where the password is generated for each add. That hasn't got anything to do with the encryption mechanism but I do like the idea of "double encrypting" the secrets even if you have to pay to store the master key!

d47853 commented 4 years ago

I wasn't aware that was an issue. Doesn't helm remove the secret when it the release is uninstalled?

m29827 commented 4 years ago

Not sure, will confirm. I think its more likely that the newly generated instance secret is not configured properly.

m29827 commented 4 years ago

@d47853 the secrets are removed. The problem comes down to the re-use of the EBS volume when the graph is "re-added", Accumulo master fails to initialise the HDFS file-system as the target directory is not empty:

2020-07-24 12:21:32,119 [init.Initialize] ERROR: FATAL It appears the directories [hdfs://best-hdfs-namenode-0.best-hdfs-namenodes:8020/accumulo] were previously initialized.
2020-07-24 12:21:32,119 [init.Initialize] ERROR: FATAL: Change the property instance.volumes to use different filesystems,
2020-07-24 12:21:32,119 [init.Initialize] ERROR: FATAL: or change the property instance.dfs.dir to use a different directory.
2020-07-24 12:21:32,119 [init.Initialize] ERROR: FATAL: The current value of instance.dfs.uri is ||
2020-07-24 12:21:32,119 [init.Initialize] ERROR: FATAL: The current value of instance.dfs.dir is |/accumulo|
2020-07-24 12:21:32,119 [init.Initialize] ERROR: FATAL: The current value of instance.volumes is |hdfs://best-hdfs-namenode-0.best-hdfs-namenodes:8020/accumulo|

The Accumulo components then fail to start as the expected instance id f3299a4b-0ea6-455b-85c2-b130eeac4856 is that of the previously installed graph. The result is lots of errors like this:

Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /accumulo/f3299a4b-0ea6-455b-85c2-b130eeac4856/masters/tick

d47853 commented 4 years ago

Ah right okay. So would removing the EBS volume when a graph is uninstalled fix the issue?

m29827 commented 4 years ago

Yes that would work

d47853 commented 4 years ago

Can this be closed as the ebs volume ticket has been completed?

m29827 commented 4 years ago

@d47853 yes it can be closed.

gchq / Kai

Externalise storage of graph credentials #44