Containers at the API level and managed by dotmesh violate abstractions

deitch commented 6 years ago

There is a number of API calls that interact with containers, e.g. ContainersById, Containers and SwitchContainers. It also restarts containers when a branch switches.

There are several issues with this:

Technical: It violates layer abstractions (to use @lukemarsden's clear formulation). dotmesh operates at the storage level, a layer below container orchestration. It shouldn't reach up into the orchestration layer.
User Experience: most people do not expect swapping a filesystem to restart a process. In many cases, it may be dangerous to the process. As Luke pointed out, good IDEs notice when the underlying filesystem has changed, but changing a file does not forcibly restart the IDE. Having it automatically restart a container will surprise the user in many cases. While I get that this may be desired for other user experiences, if we could find a way to do this without violating layers, we should make it an option. Perhaps a notification system to message to some higher layer, "filesystem has changed, do what you will", is a better option.
Product: Having containers explicitly in the API and product ties us conceptually to containers. While we may end up having a large market built entirely around containers, we do not have sufficient evidence to be willing to constrain ourselves, and more importantly our product brand, to that market.

/cc @mrmrcoleman, looking for feedback.

lukemarsden commented 6 years ago

not restarting the process is probably more dangerous than restarting it in many cases.

i'm supportive of moving this functionality into another layer in due course, as long as we do that without breaking the dm checkout-changes-the-state-in-your-app UX

lukemarsden commented 6 years ago

making the integration pluggable, and providing a docker implementation, would be cool if it turns out that people value the functionality.

deitch commented 6 years ago

not restarting the process is probably more dangerous than restarting it in many cases

I am unsure. I think many people's reaction would be, "don't restart my process unless I choose to."

lukemarsden commented 6 years ago

I can tell you mysql's reaction will be like waaah omg wtf my cache doesn't match up with what's on disk explode 🔥

lukemarsden commented 6 years ago

@alaric-datamesh previously talked about extending the concept of things which are using a volume (the writeable part of a branch of a dot that's mounted into a consumer) into different types. some of them might be restartable via a well defined interface, others might just cause e.g. a rollback to error.

deitch commented 6 years ago

If dotmesh doesn't take off, we definitely are creating a new startup to write error messages for other products. I would love to see @lukemarsden's error messages in the mysql codebase!

deitch commented 6 years ago

as long as we do that without breaking the dm checkout-changes-the-state-in-your-app UX

I get that, totally. It is the magical beauty of it. The things necessary to get it working (using our example cases) are:

An updated underlying store. Dotmesh.
A process that refreshes itself from the filesystem and can handle underlying changes.
A UI that is updating regularly. If we have an Ajax Web app or Websockets with push or something updating, then fine.

That process has infinite varieties. Some will "just work" (like the IDEs you brought up); others want to be told via some signal; still others need to be restarted. It will be nearly impossible for us to capture them all. We aren't going to have a websocket that the browser connects to so we can tell it to refresh because the database is updated; nor are we going to figure out which one is running docker, which rkt, which runs, which vmware, and integrate to every single one into the Dotmesh dot-management system.

We will need a higher layer of abstraction to interact with the process management system (vmware, kube, posix, whatever) and tell it, "this has changed, do refresh action Y", or maybe, "this has changed, do whatever you think is right."

I think we are roughly on the same page in that regard.

However, as a starting point, to make it automatic that: t a- it is linked to containers b- docker containers (what happens in kube?) c- always restarts with no option one way or the other is asking for trouble.

I think until we have that higher-level signaling system in place, we should not restart automatically, but should have an option to do so via API/CLI.

dm checkout <branch> --restart-containers

We get the UX we want, without forcing users into a place they may or may not want.

lukemarsden commented 6 years ago

I'm with you, except the default should to restart containers and the option should be to disable it. Otherwise we default to data-destructive operation for all off the shelf databases. Later, it should be possible to configure the server with optional integrations with things other than Docker here.

lukemarsden commented 6 years ago

Please open a separate issue for what happens with kube. That's a gap that we need to fix in the v0.2 timeframe.

lukemarsden commented 6 years ago

Do you have any examples of software that can be notified that its on-disk structure has been changed underneath it via a signal?

alaric-dotmesh commented 6 years ago

I would love to see @lukemarsden's error messages in the mysql codebase!

The MySQL codebase is a dumpster fire, and setting all the error messages to "An error has occurred, dunno what that means for you, not my problem" would be an improvement in many cases because then at least they'd be honest...

Other than that, my "dream" would be a two-stage protocol.

Tell the containers (or other volume-users, in general) that a change is a-coming. Wait for them to confirm readiness.
Swap filesystem.
Tell the containers it's done.

The current behaviour plugin can stop/start the containers as the before and after steps, but we can do softer forms of restart otherwise. Certainly, I've never seen something like a relational database have support for this other than restarting the thing, but if apps become more filesystem-rollback-aware in future, they might expose that interface.

This has many parallels with the issues around when to snapshot filesystems - indeed, we need the same three-phase protocol for communicating with the volume-users in that case.

deitch commented 6 years ago

would be an improvement in many cases because then at least they'd be honest

LOL! A little understatement there, @alaric-datamesh ?

Tell the containers

I had a long discussion at some point with some of the kubernetes people and docker people that we need a standard signaling mechanism for "reread your stuff". That can be "restart" or might be "reread config file off disk." Apache always used SIGUSR1, as did many others, but containers and their runtimes, along with standardization in orchestration create new opportunities to really clean up the mess.

alaric-dotmesh commented 6 years ago

Yeah, I mean, we have the SIGHUP convention for config files and all that, and daemons whose function is to serve up content from filesystems tend to be written with the assumption that it could change at any time, but apart from blob storage most data is in DBs that very much assume they're master of their own disks. The difference, largely, is that daemons that update the data themselves like to cache/coalesce writes in memory and other such tricks!

deitch commented 6 years ago

written with the assumption that it could change at any time

As you pointed out, those are primarily for config, i.e. as far as the process is concerned, it is read-only, while databases expect exclusivity.

master of their own disks

Remember the first few years of Sybase and Oracle and DB2? You couldn't even give them paths on the filesystem; you had to give them block devices!

lukemarsden commented 6 years ago

We should have an explicit interface for runtime-detecting and controlling things.

For example, file shares using a dot, or kubernetes using a dot.

dotmesh-io / dotmesh

Containers at the API level and managed by dotmesh violate abstractions #208