henn commented 9 years ago

Currently, users can call start_console() on a node as many times as they want, with each call starting a new IPMI tool. This could cause 2 DOS-y types of problems: a) I'm not sure if IPMI even supports more than one console-gathering process, and b) this could create a bunch of processes on the HaaS API server

Multiple IPMItool instantiations aren't necessary, since just 1 instantiation will record the entire serial console, and many others can fetch the contents in parallel.

We should limit this to 1 process (perhaps using the DB).

Note that this might cause issues with making HaaS multi-node (see #416) since we're just logging to a tmp file, thus requiring some more work, but let's leave that as separate future work.

zenhack commented 9 years ago

Thanks for bringing this up --- the general kludgy-ness of the implementation has been bugging me.

We really need a better story on process-control re: serial console stuff in general. Here's a somewhat more ambitious idea of where this could go (eventually, we could talk about possible shorter-term fixes as well):

Rather than having start_console/stop_console, we just use a persistent connection or websocket, don't do logging at all, and just dump the output from the ipmi process to the response until the client disconnects, at which point we kill ipmitool.

There's still the question about multiple connections, but this at least gives a clean way to make sure the process dies when it's supposed to.

zenhack commented 9 years ago

I'm removing the easy label for this, and putting one of the (many, for some reason) discussion labels on this. We need to decide on an appropriate solution before someone goes and codes it.

zenhack commented 9 years ago

I'm changing the title of this to make it a little more obvious what the core issue is. I feel pretty strongly that we need to pin down a better design, not just find some hacky way of fixing the handful of serious issues we have now.

zenhack commented 9 years ago

ipmitool doesn't seem to multiplex the serial console when multiple processes connect to it. I don't think it's too big of a restriction to just refuse to allow more than one connection to the serial console; we can just lock a file under /tmp whose name is a function of the node's name and do the above, but report an error if there's an existing serial connection.

lokI8 commented 9 years ago

It makes sense to put a lock on the console, only allowing one person to watch at a time. I don't think lack of multiplexing will be an issue for the console.

I think we need to make sure that the lock gets released if the connection is unresponsive for x time, in case the stream drops without the process being killed, or something whacky happens on the user's side...

zenhack commented 9 years ago

The one downside to using a lock in /tmp is that it only works if you only have one API server. When I was playing with it before, running two ipmitool processes looking at the serial console didn't seem to break anything, just only one of them actually gets the data. So there are two ways of doing this:

Use a lock, report an error, and only support one API server
No locking, if a user tries to open more than one instance of the console they just get no data. This lets us run more than one API server, but is less user-friendly.

@henn, @gsilvis, @lokI8, can we come to a consensus? I don't have a strong opinion.

zenhack commented 7 years ago

So, I want to fix this already. Here's what I'd like to do:

Rip most of the direct console support out of the API server proper.
Add a separate daemon that will manage the consoles.

Operation of the daemon

"Admin" operations

These operations would require some kind of authentication/authorization. This could just be a password, which is stored by the daemon (salted & hashed) and also somewhere where the API server can get at it (in cleartext or equivalent, so it can be used, not just checked). Probably an extra column in the obm table or such.

PUT /node/{node_id}/owner takes a parameter in the body identifying a project which "owns" the node. If there are any existing connections to the console of this node, they are dropped, and their URLs (described below) are invalidated.
DELETE /node/{node_id}/owner removes any owner from the the node. Any existing console connections are dropped, and their URLs are invalidated.
POST /node/{node_id}/console-endpoints takes the node's current project id, and returns a new, cryptographically random URL like /node/{node_id}/console?token=<unguessable UUID>) that can be used to connect to the console. This is similar to how nova manages the VNC console for its VMs. If the supplied project does not match the current owner of the node, the request is rejected. The project parameter is for synchronization, as described below.

Regular user operations

GET /console/<unguessable UUID> connects to the console identified by the URL, which will have been returned by an admin-only call as described above. If there is an existing connection, we could either reject the request or drop the existing connection. A successfully connection would just start streaming data from the console, so you could just curl the URL to start reading the console.

HIL API server changes

We drop the current console code entirely
We re-implement show_console as follows:
1. Verify auth, as with any API call.
2. Do a POST on the console daemon to get an unguessable URL. We supply the project that the DB says owns the node in the body.
3. If the above is successful, either redirect the client to our secure URL, or connect to it and proxy.
Modify project_connect_node to do a PUT with the new project to the appropriate URL on the console daemon.
Modify project_detach_node to do a DELETE on the node owner URL.

project body parameter: why

The project body parameter in the secure URL request is there to avoid the a race condition between project_detach_node/project_connect_node and show_console, where:

The request handler for show_console checks the authentication and authorization constraints.
Then, the request handlers for project_{detach,connect}_node run all the way through, re-assigning the node to another project.
Having validated that the user is authorized to view the console, the show_console request handler gets a secure URL, and redirects the user to it (or proxies it).
The user now has a live connection to a node they are not supposed to have access to anymore.

The project parameter prevents this, since in step (3), the project would no longer check out, and the request would be rejected by the console daemon.

Advantages

The biggest advantage, and the reason for the whole thing, is to make the console setup actually robust. That said there are another of other advantages:

It opens the door to moving more of the OBM support out of the API server; if we moved the rest of the methods to this daemon, we wouldn't even need the API server to be on the same network with the IPMI controllers. We can discuss whether this actually makes sense, but it becomes an option.
It takes complex control flow out of the API server, which is mostly otherwise doing simple CRUD operations. We already have the switch management in a separate daemon. This has similar advantages.
It means less stuff in the same address space, containing the scope of vulnerabilites.
It provides a clean api that can be implemented on top of things other than IPMI, without having to add even more complexity to the API server.

If folks agree this is a good approach, I can put it together pretty quick.

//cc @naved001, @knikolla

zenhack commented 7 years ago

So I decided to just go ahead and do this, then talk about it. I wrote a prototype of the daemon:

https://github.com/zenhack/console-service

Not a bad way to spend an evening.

It mostly works -- need to call ipmitool instead of just making a network connection, and there's an imporant FIXME in there that needs dealing with. And we'll likely want to persist the data for resarts. But, other than that, we mostly just need the tweaks to the API server.

It also probably needs some more documentation.

You'll note I wrote it in Go, not Python.

naved001 commented 7 years ago

@zenhack I like the idea of having a separate daemon mainly because it makes the API server simple and the console robust. I'll need to take some time to look at the current code and the issues it has. Additionally, since I am not familiar with Go it will take me some time to comment on it.

zenhack commented 7 years ago

Of course. Don't hesitate to pick my brain/tell me something needs documenting.

henn commented 7 years ago

A couple of thoughts/questions

I like the event-y type design
I like that we could potentially hide OBM creds and connectivity from the API by splitting the console service off.
I'm concerned that keeping state in memory means one process crash causes the entire thing to lose state/synchronization with the HIL server and means there's no ability to connect until someone forces the HIL server to sync the state of every node over again.
- Was your thought to eventually use the DB?
- Maybe a pull-model for state could fix this? See the next suggestion.
Another option to the console-specific token mechanism that reduces the state the console service would need to keep would be to have the console service accept the user's token (user/pass or a keystone token), then check with the API server if that user should have access (maybe using a mechanism like #692).
- This could eliminate needing to notify the console service on node allocations.
- We could even optimize the node deallocation notification away when unnecessary if HIL kept a console_accessed metadata flag on the node that was checked on node free.
- Note that services calling other services in OpenStack have a mechanism by which they can include their own service token in addition to the user's token, though in the above case, having a user set the console_accessed flag shouldn't be a problem.
Do you have an idea on how this design could be extended in the future to multiple hosts for horizontal scaling and redundancy?
Does this handle bi-directional traffic (reading from and writing to the remote console)? We've had at least one request for write access.
What would the client look like for that? As in, would reading/writing be easy for normal REST-y applications, without having to do a separate HTTP request for each keystroke?
Another way to do the token mechanism would be to use something like OpenStack's Fernet tokens (something encrypted and HMAC'd with a shared secret), though protecting against race conditions could be tricky.

zenhack commented 7 years ago

Quoting Jason Hennessey (2017-06-01 20:17:59)

 * I'm concerned that keeping state in memory means one process crash
   causes the entire thing to lose state/synchronization with the HIL
   server and means there's no ability to connect until someone forces
   the HIL server to sync the state of every node over again.
      + Was your thought to eventually use the DB?
      + Maybe a pull-model for state could fix this? See the next
        suggestion.

This is something I was still waffling on. It might be a good idea to persist node/ownership info to a db at the very least. I think keeping the tokens in memory is fine, since user can easily get a new one (provided they are still authorized to do so).

Putting everything in memory was mostly because this was a prototype.

 * Another option to the console-specific token mechanism that reduces
   the state the console service would need to keep would be to have
   the console service accept the user's token (user/pass or a
   keystone token), then check with the API server if that user should
   have access (maybe using a mechanism like [1]#692).
      + This could eliminate needing to notify the console service on
        node allocations.
      + We could even optimize the node deallocation notification away
        when unnecessary if HIL kept a console_accessed metadata flag
        on the node that was checked on node free.
      + Note that services calling other services in OpenStack have a
        mechanism by which they can include their own service token in
        addition to the user's token, though in the above case, having
        a user set the console_accessed flag shouldn't be a problem.

This sounds like additional complexity for no real benefit. As I describe above, the token can be entirely ephemeral, so keeping the state isn't too much of a burden, and it's a lot simpler than integrating with the kind of stuff you're talking about.

 * Do you have an idea on how this design could be extended in the
   future to multiple hosts for horizontal scaling and redundancy?

One observation is that not every node has to have the same base api endpoint -- we could just have the console daemon url/admin token be part of the body of node_register, and have separate instances of the console daemon for different parts of the network.

Re: redundancy, that's a bit tricker, since we need to somehow synchronize them. Beyond standard replicated-state-machine stuff I'd need to think about it, but I don't know that we need more than that -- 1 console server per node isn't problematic from a scalability perspective since we're only allowing 1 connection per console anyway.

 * Does this handle bi-directional traffic (reading from and writing
   to the remote console)? We've had at least one request for write
   access.

No, but it would be a trivial extension. The main reason we haven't implemented it so far is that we were worried about implications re: low level access to the BIOs and such.

 * What would the client look like for that? As in, would
   reading/writing be easy for normal REST-y applications, without
   having to do a separate HTTP request for each keystroke?

There are a number of ways of getting bi-directional streams with HTTP. One is to use websocket, which has the nice advantage of making browser-based clients possible.

The CLI client is the sort of thing I've written in 10-15 minutes on a number of occasions.

 * Another way to do the token mechanism would be to use something
   like OpenStack's Fernet tokens (something encrypted and HMAC'd with
   a shared secret), though protecting against race conditions could
   be tricky.

This sounds more complicated, and I don't see a clear benefit. It also makes clients harder to write -- with the current solution you're basically talking netcat plus a websocket library. I noted earlier that nova does something similar to what I proposed for its VNC consoles.

zenhack commented 7 years ago

@henn, curious to your further thoughts.

zenhack commented 7 years ago

I've been thinking a bit more about the data sync issue. I have a slight tweak in mind for the current console serer API, that I think solves the sticking points with the current mechanism.

Instead of the console tracking an owner per node, it tracks a version number, which is incremented every time the node is modified, and can be incremented manually via an api call. Incrementing the version invalidates any tokens.

When getting a token to pass to the client, instead of passing the owner for synchronization purposes, HIL passes the version. Otherwise, this step works as before -- if the version doesn't match, the operation fails. We could return the correct current version, making it easy for HIL to update the info it has on file.

When doing project_{connect,detach}_node, HIL would increment the version, to invalidate any tokens and prevent the previously described race condition.

This solves the problem of keeping owners in sync, since it kills the owner field entirely. The version field is easier, since being out of sync just means one failed operation (which can happen normally), after which the database is updated.

This also feels like it would be easier to generalize to other things.

This still leaves us with redundant node metadata -- impi user/pass/host are still duplicated. One way to solve this is to just move the power toggling into this daemon as well, so it doesn't have to be in HIL's database. Furthermore, this isn't nearly as much of a problem, since it won't in general cause anything to fail.

@henn, @naved001, curious as to your thoughts?

naved001 commented 7 years ago

sorry but I don't understand the data sync issue. What's wrong with tracking the owner per node? Could you describe a scenario where this would be a problem? Thanks!

This still leaves us with redundant node metadata -- impi user/pass/host are still duplicated. One way to solve this is to just move the power toggling into this daemon as well

There's bunch of other stuff as well, like power cycle, changing boot device. We might as well move all of OBM stuff into this daemon. And if it has it's own database then this would need it's own migration scripts too I guess.

zenhack commented 7 years ago

Quoting Naved Ansari (2017-08-23 16:24:36)

sorry but I don't understand the data sync issue. What's wrong with tracking the owner per node? Could you describe a scenario where this would be a problem? Thanks!

During a project_connect_node or project_detach_node, the API server needs to update the owner of the node with the console daemon. Because the call to set the owner in the console daemon isn't protected by a transaction, if either daemon dies between updating the owner in the console daemon, and committing to the HIL api server's database, then when they come back up, they will disagree about the owner of the node. This happens regardless of the order in which these operations are done.

Using versions instead avoids the console needing to care about the owner, so this doesn't come up.

Does that make sense?

There's bunch of other stuff as well, like power cycle, changing boot device. We might as well move all of OBM stuff into this daemon. And if it has it's own database then this would need it's own migration scripts too I guess.

Yeah, that's what I meant.

naved001 commented 7 years ago

Does that make sense?

Yes. Now I have some more questions.

it tracks a version number, which is incremented every time the node is modified,

how does the console service know that a node was modified? Does HIL call the console service to update the version number when it does a project_connect/detach_node?

When getting a token to pass to the client, instead of passing the owner for synchronization purposes, HIL passes the version.

It's not very clear how this works. As a user who wants to request a new token, what do I pass to the console service?

zenhack commented 7 years ago

how does the console service know that a node was modified? Does HIL call the console service to update the version number when it does a project_connect/detach_node?

That's the idea.

It's not very clear how this works. As a user who wants to request a new token, what do I pass to the console service?

The whole thing works exactly as before, except that when the api server creates a token for the user to use, it supplies the expected version number, instead of the expected owner.

Basically, everything that used to manipulate the owner now just bumps the version. The rest of the process is the same.

naved001 commented 7 years ago

Cool. Seems fine to me.

@henn you have any comments on this?

zenhack commented 7 years ago

A note, I went ahead and made the version change update in the console-service repo.

I can see two ways of proceeding:

Do things as described originally; gut the existing console code from HIL, and replace it with calls to this daemon. We'll still have to have the node info duplicated though.
Go ahead with moving the rest of the impi driver's functionality into this daemon, then swap in a new obm driver that just takes a base URL and an admin token in node_register.

It occurs to me that we only have one obm driver right now; we could possibly deprecate the drivertized obm interface, and just have the rest api exposed by the new daemon be how we talk to obms. Just a thought.

I'd like to come to a decision on this so I can get to work without too much risk of having to throw stuff out.

//cc @naved001.

@henn hasn't been terribly responsive on this issue, and it's been slowing things down. I'd like to have a second person (besides naved) to be weighing in on stuff; is there someone who has the time to commit to better response times on this that I could be bugging?

naved001 commented 7 years ago

@knikolla can you weigh in on this please? Thanks!

SahilTikale commented 7 years ago

I have classes and other workload, but I will have a look at it too.

On Wed, Sep 6, 2017 at 11:32 AM Naved Ansari notifications@github.com wrote:

@knikolla https://github.com/knikolla can you weigh in on this please? Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CCI-MOC/hil/issues/417#issuecomment-327522242, or mute the thread https://github.com/notifications/unsubscribe-auth/ACSZ_hUfKg92CFmsMsjQIb35tAbtYQ1aks5sfrsNgaJpZM4EKn9y .

-- --Sahil

zenhack commented 7 years ago

@knikolla, ping.

naved001 commented 7 years ago

Do things as described originally; gut the existing console code from HIL, and replace it with calls to this daemon. We'll still have to have the node info duplicated though.

The ipmi information (hostname, user, passwords) might not change often in real life, but we can't make that assumption. So I am not sure how I feel about duplicating that because if it does change we would have to worry about keeping it in sync.

Go ahead with moving the rest of the impi driver's functionality into this daemon, then swap in a new obm driver that just takes a base URL and an admin token in node_register.

This seems more reasonable.

It looks like the ipmi stuff is gonna be it's own thing. It would have its own database (and related code), driverization etc. Feels like a lot of work!

zenhack commented 7 years ago

Quoting Naved Ansari (2017-09-12 14:42:07)

 Go ahead with moving the rest of the impi driver's functionality
 into this daemon, then swap in a new obm driver that just takes a
 base URL and an admin token in node_register.
This seems more reasonable.

It looks like the ipmi stuff is gonna be it's own thing. It would have its own database (and related code), driverization etc. Feels like a lot of work!

This was my instinct as well. I don't think it will be that bad actually. I'm inclined to try to get the daemon part mostly feature-complete by the end of the week. It would be nice to get an additional opinion first, but I'm not inclined to wait around forever.

zenhack commented 7 years ago

Status update: I've modified the daemon so that it stores the node information inside of a database. console tokens are still ephemeral, and my intention is for them to stay that way -- if the in-memory state is lost you need to call show console again anyway.

Most likely my immediate approach to the rest of this will be to just replicate the rest of the HIL power management api, which should be straight-forward enough. Once this is set, we can write a HIL obm driver where the extra info we store is just the URL/admin token for the console daemon, and the Obm methods just call out to the daemon.

naved001 commented 7 years ago

cc @Izhmash

ianballou commented 7 years ago

More of an administrative question here: @zenhack do you have an idea about how we'll want to package the console daemon with HIL in the future?

zenhack commented 7 years ago

The master branch on my repo now handles the power_off, power_cycle, and set_bootdev api calls, which work exactly as in HIL, except that they take the token as a query parameter (as in the console streaming call). So you do e.g:

PUT /node/node4/boot_device?token=9832t3857faed9382...

Could you test it with some real machines to make sure I got the implementation right? I can only easily test against the dummy drivers.

zenhack commented 7 years ago

@Izhmash, haven't spend much time thinking about it. I think it probably makes sense to keep the packaging somewhat separate for the time being, as being able to just do pip install hil would be nice.

okrieg commented 7 years ago

If you want to bounce off of one more person, someone grab me tomorrow and will try to get up to speed.

Sent from my iPad

On Sep 12, 2017, at 6:29 PM, Ian Denhardt notifications@github.com wrote:

Quoting Naved Ansari (2017-09-12 14:42:07)

Go ahead with moving the rest of the impi driver's functionality into this daemon, then swap in a new obm driver that just takes a base URL and an admin token in node_register.

This seems more reasonable.

It looks like the ipmi stuff is gonna be it's own thing. It would have its own database (and related code), driverization etc. Feels like a lot of work!

This was my instinct as well. I don't think it will be that bad actually. I'm inclined to try to get the daemon part mostly feature-complete by the end of the week. It would be nice to get an additional opinion first, but I'm not inclined to wait around forever. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

zenhack commented 7 years ago

So, I spent a bit more time thinking about this, and I'm not sure the versioning actually makes this any better. The question not addressed is, when does HIL update what version it has on file? I can't think of an answer that doesn't have one of these problems:

Allows for some kind of race condition with equivalent consequences to the above.
Potentially allows things to get out of sync in a way that won't correct itself without manual intervention.

Perhaps we need to have some sort of start-up/recovery logic. I have some rough thoughts on how that might work, but piecing through this is giving me some worries about the micro services approach in general. I don't think not doing this as a separate service really solves anything in this particular case, but I'd like not to have to pick through this kind of problem in an ad-hoc fashion with any regularity.

naved001 commented 7 years ago

So, I spent a bit more time thinking about this, and I'm not sure the versioning actually makes this any better. The question not addressed is, when does HIL update what version it has on file? I

a few weeks ago we thought this would work

how does the console service know that a node was modified? Does HIL call the console service to update the version number when it does a project_connect/detach_node?

That's the idea

So HIL updates the version number it has on file when it does a project_connect/detach_node and asks the console daemon to update it too.

Could you describe a scenario that could lead to a race condition?

zenhack commented 7 years ago

I did some more puzzling, and I think I do have a full solution, but there's some subtlety to be dealt with, and I could use an extra pair of eyes. I'm going to move forward with the implementation in the interest of not blocking on this, but @okrieg, @pjd-nu, @henn (and everyone else), I would really like it if you guys would pick this apart.

here's the thing that I was puzzling over: We can't do the database update and the version number update in the console daemon atomically, so we need to pick an order.

If we update the console daemon first, and the handler for project_detach_node then dies before committing, the version number is now out of sync with the state; calls to show_console will fail, even though the database still says they're ok. Even without the project_detach_node handler dying, this can happen transiently, but that's not a huge deal since the inconsistency will self-correct quickly. But if the handler fails in between, the inconsistency is durable. If we do things in this order we somehow have to deal with that. Avoiding this sort of thing was the point of switching to version numbers in the first place.

This could also end up with the version being multiple times, but that's easy enough to solve: just have the bump-version call take an expected current version, just like the other calls do.

If we update the database first, then bump the version in the daemon, The same thing can happen, except that the version in the database will be newer than the one on the daemon. This is probably easier to cope with, since it's a dead-giveaway something has gone wrong. But what should show_console do if it sees a version number that is too new?

I think just bumping the version number again works, with a few subtleties:

We'll want to make sure we don't double-bump; see above for the solution to that.
We should start the call all over once we've done this, abandoning the current database transaction and re-checking permissions, etc. The easy way to handle this is to just return a (transient) error to the client, and have them try again.
Calls to project_detach/connect_node should check the version number first, so they don't bump the db ver and end up two versions out of date.

So there are a few tweaks to the daemon I need to make, which I'll get started on. Again, would appreciate folks staring at this a bit.

zenhack commented 7 years ago

Also, @naved001 I would still appreciate it if you would test the extra calls I added (power cycling etc). I just pushed a couple commits that tweak the API according to the strategy I described above, including updating the README, but I don't think any of the changes should really affect your testing anyway.

naved001 commented 7 years ago

all the new ipmitool commands work correctly.
any active connection is dropped if a new token is requested, or the version number is updated.

starting a second connection causes both connection to fail.

preexisting connection ends with this


HTTP/1.1 200 OK
Content-Type: application/octet-stream
Date: Thu, 05 Oct 2017 21:18:59 GMT
Content-Length: 45

[SOL Session operational. Use ~? for help] [naved001@rhel-tgt-base console-service]$

and the new connection fails with this

HTTP/1.1 200 OK Content-Type: application/octet-stream Date: Thu, 05 Oct 2017 21:19:00 GMT Content-Length: 53

Info: SOL payload already active on another session [naved001@rhel-tgt-base ~]$



looks like ipmitool is handling it?

zenhack commented 7 years ago

Quoting Naved Ansari (2017-10-05 17:30:44)

 * starting a second connection causes both connection to fail.
      + preexisting connection ends with this
HTTP/1.1 200 OK Content-Type: application/octet-stream Date: Thu, 05 Oct 2017 21:18:59 GMT Content-Length: 45

[SOL Session operational. Use ~? for help] [naved001@rhel-tgt-base console-service]$

and the new connection fails with this HTTP/1.1 200 OK Content-Type: application/octet-stream Date: Thu, 05 Oct 2017 21:19:00 GMT Content-Length: 53

Info: SOL payload already active on another session [naved001@rhel-tgt-base ~]$

looks like ipmitool is handling it?

What I suspect is happening is that the second ipmitool is starting before we've finished shutting down the first one, so it gets the busy message, and then we shut down the other one, so they both get disconnected. What should happen is that the first one is completely shut down before the next one is started, so the first connection is booted and replaced by the second. I'll dig into the details.

zenhack commented 7 years ago

Hm, is it possible that there's some kind of explicit shutdown that we need to be doing but aren't? @naved001, what happens if, after trying to start a second connection and having them both break, you try to connect again?

zenhack commented 7 years ago

For posterity, @naved001 and I talked about a few things via IRC:

(02:23:07 PM) naved: ian, I was thinking about the console server stuff today.
Why does the console daemon need to who owns the node at all?
(02:23:24 PM) naved: like the HIL API server does the authentication, and then
just proxies the connection
(02:24:13 PM) naved: or if you are worried about the URL, then we can just
generate one every time a connection is made. never keep it the same
(02:29:21 PM) isd: The race condition was the whole reason.
(02:29:43 PM) isd: With the version scheme, the console daemon doesn't store
ownership info at all.
(02:30:08 PM) isd: Or do you just mean the tokens?
(02:30:32 PM) naved: i meant the token, generate a new token everytime a new
connection is made
(02:30:49 PM) isd: That's effectively what we're doing.
(02:31:28 PM) isd: unfortunately http is connectionless, so we can either proxy
or we can have the token persist at least until the *next* connection.
(02:31:43 PM) isd: Either way, it doesn't solve the race condition.
(02:32:12 PM) isd: And proxying complicates the api server, since now it has to
tie up a request handler.
(02:32:18 PM) naved: okay
(02:33:06 PM) isd: ...so you start worrying about whether that means a greenlet,
an OS thread, a process... has implications for web-server setup etc. Part of
why I didn't want to write a daemon like this in python.
(02:33:30 PM) naved: I see,
(02:34:21 PM) naved: let me step back a little, the issue was opened to limit
the console processes to 1 and then you changed the title to say that's the
console code generally sucks
(02:34:37 PM) isd: Yeah.
(02:34:42 PM) naved: does the current console code still have the same race
conditions ?
(02:34:47 PM) isd: It also doesn't limit log size
(02:35:28 PM) isd: No, the current design doesn't do streaming, so you don't
wind up with the potential for a dangling connection.
(02:35:32 PM) isd: Which is a plus
(02:35:44 PM) isd: But it also means managing the backlog, which right now it
does badly.
(02:36:04 PM) isd: Whereas with the new design, we don't remember any state, we
just stream what's there when the user connects
(02:36:20 PM) naved: okay
(02:36:25 PM) isd: There are trade-offs.
(02:37:05 PM) isd: I think the streaming design results in a better product from
the user's perspective, and it's not clear to me that it's actually any harder
than managing the state with the other design.
(02:37:36 PM) isd: The current design also has problems with stray ipmitool
processes, and a bunch of other resource management issues
(02:37:45 PM) naved: yeah, i didn't realize the current code couldn't stream.
(02:38:01 PM) isd: No, it just logs and when you do show_console it dumps the
log.
(02:38:07 PM) isd: Which, right now, is unbounded in size
(02:38:16 PM) isd: though I think the intention was to have it be basically a
ring-buffer
(02:38:16 PM) naved: oh, so i can just keep logging to it forever
(02:38:24 PM) naved: and it's not "live"
(02:38:51 PM) isd: There are start/stop console calls that manage the logging
(02:39:05 PM) naved: i see; let me think about it more; and I'll hit you up
again.
(02:39:11 PM) isd: OK.
(02:39:33 PM) naved: on a related note; this daemon is very ipmi centric, right?
(02:39:43 PM) isd: It doesn't need to be.
(02:40:07 PM) isd: The HTTP API doesn't have much ipmi-specific going on; you
could generalize it without much trouble.
(02:40:24 PM) isd: We'd have to do a bit of refactoring of the internals, but it
wouldn't be much work.
(02:40:50 PM) naved: okay. so HIL talks to the HTTP API of the console daemon,
and then the internals will be driveritized
(02:40:56 PM) isd: Might make sense to do that  before deploying it, just so we
don't have to worry about migration.
(02:41:08 PM) isd: That's the notion.
(02:41:13 PM) naved: cool
(02:42:00 PM) isd: It's half drivertized already, just for testing purposes.
We'd just have to change the IpmiInfo type to something more general, and change
the names of things.
(02:42:53 PM) naved: okay
(02:45:33 PM) isd: I'm debating how to go about making the changes in HIL
proper. In particular, whether it's worth keeping the existing OBM
drivertization; we don't actually *have* any other OBM drivers. It might be
easier to implement a mock daemon than maintain the abstraction layer.
(02:46:11 PM) isd: Also, since it's managing all of the obm stuff,
'console-service' isn't really an accurate name, so we should change it
(02:46:15 PM) isd: (it's not much of a name anyway)
(02:46:23 PM) naved: obm daemon
(02:46:23 PM) isd: I'm thinking obmd. opinions?
(02:46:34 PM) naved: yep
(02:46:40 PM) isd: Done. I will rename the repo.
(02:47:06 PM) naved: yeah, we don't need to have two sets of code doing the same
thing.
(02:47:53 PM) isd: Cool. I think I'll submit a blueprint re: internal changes to
HIL; I'd like to hash that out before starting to hack on it.
(02:48:23 PM) isd: Also, is it cool if I post this conversation on the bug
tracker somewhere? Good to have a paper trail for this stuff.
(02:48:28 PM) naved: of course
(02:48:38 PM) isd: Great, will do.

I did go ahead and rename the repo.

naved001 commented 7 years ago

Hm, is it possible that there's some kind of explicit shutdown that we need to be doing but aren't?

I don't think so, looked up the man page for ipmitool here and it had some commands and none of those worked. I think it depends on the type of node.

        Special escape sequences are provided to control the SOL session:

~. Terminate connection
~^Z Suspend ipmitool
~^X Suspend ipmitool, but don't restore tty on restart
~B Send break
~~ Send the escape character by typing it twice
~? Print the supported escape sequences

Then I came across this guide at symantec which just asks to deactivate sol. And this page about Cisco says this: "To end the SoL session, you must close the CLI session. For example, to end an SoL session over an SSH connection, disconnect the SSH connection."

what happens if, after trying to start a second connection and having them both break, you try to connect again?

It complains saying "SOL payload already active on another session". To get around this, ~I toggle sol~, I ran deactivate from another window

sol deactivate

~sol set enabled false~ ~sol set enabled true~ now a new session can activate sol.

Also, I think it was discussed how we should give write access to console, just keep in mind that giving write access means giving access to the BIOS as well where I could mess up anything I want. Although there was an option to set privilege-level when using sol privilege-level user | operator | admin | oem it didn't work at all. it always defaulted to admin Info: SOL parameter 'Payload Channel (7)' not supported - defaulting to 0x0e

zenhack commented 7 years ago

Opened a PR with a possible fix, tagged you in it.

naved001 commented 7 years ago

@zenhack Any updates on this? Also, do you plan to open a PR with the specs for this anytime soon? Thanks!

zenhack commented 7 years ago

Sorry, this kindof slipped through the cracks. I'll try to find some time for the spec late this week.

That said, I'm a bit unsatisfied with the subtletly of the algorithms here; I wish I knew of a way to make this easier to get right.

I did a bit of refactoring of OBMd and was in the middle of debugging a failing test when I apparently got sidetracked... that needs to get fixed too.

naved001 commented 7 years ago

cool, no worries. Thanks for tackling this! Let me know if there's something you want me to look into. @mosayyebzadeh @razaaliraza should be able to help too!

zenhack commented 7 years ago

The more I think about this the more troubled I am by the complexity of the interaction. If it were just the one thing, I'd be OK with having thought very carefully about it and come to a solution, but I'm worried about splititng things out into microservices if we can't come up with a more generalizable strategy for getting these kinds of things right. I have a pretty clear sense of what changes need to happen in HIL to make the current thing work, but I'm really hesitant to just plow ahead.

I'm going to spend some more time over the weekend thinking about what to do, but I have a couple of rough thoughts/observations/instincts:

Half the reason we're tending towards splitting things up into separate services is that doing this kind of concurrent design inside of python (and in particular using wsgi) is clumsy.
- The constraints of wsgi make it very difficult to have any kind of threading going on in-process besides the request handlers.
- Go makes this kind of stuff super easy, which is one of the reasons it's so popular for this kind of software.
- Dropping generic wsgi support in python would make this easier; we could potentially write a daemon whose main() we control, and can use greenlets to get a lot of the nice Go stuff.
- Either of those strategies would allow us to do this kind of design but with the "daemons" being just greenlets/goroutines. This simplifes things a lot, since:
- We no longer have to worry about independent failures; we can't have a situation where half a process dies and the other half keeps going.
- We don't have to worry about authenticating communication between daemons.
- We can assume that our unit of concurrency (greenlets or goroutines) is light weight, and that we insist on a web server that can handle a huge number of concurrent connections. This would allow doing things synchronously in the request handler, rather than tossing them off to a backend daemon, which opens up new possibilities for how to structure our code.
- We're able to keep things inside of one transaction, which simplifies recovery immensely. Distributed transactions exist (e.g. postgres has a PREPARE TRANSACTION), but are complex and have problems in and of themselves.
The big concern with the above this is "what about scale?" If we insist on single-process implementation, we seriously limit our options for scaling. It's not clear to me that that's ever likely to be a concern though:
- Most of our operations, by their nature are overwhelmingly dominated by the fact that we're operating on physical machines:
  - Power management options aren't likely need to be very fast or scalable; the bottleneck there is REBOOTING A SERVER, and the request handler itself does almost nothing to achieve this. Assuming ~300W power consumption per machine, the MGHPCC is a 15MW data center, that works out to about 50,000 machines. They take several minutes to boot vs a second or so for HIL to issue the command (which isn't terribly resource intensive; it's mostly network latency). That's a tiny level of demand, and it's unlikely we're ever going to have the whole datacenter directly managed by one instance of HIL anyway. console streaming is a bit more expensive, but you're still talking about one greenlet/goroutine per physical server. It seems silly to slow ourselves down now planning for the kind of scale that would actually be a problem.
- It's still feasable to scale horizontally just by having different pods/racks managed by separate instances of HIL. there's some administrative overhead making sure the projects are synchronized (to the extend they need to be), but that strikes me as much easier to deal with than fine-grained concurrency issues like what we're hitting with the console stuff.
The other big concern is redundancy; this is a much more credible reason to want to have more than one machine. But for that we may be able to get away with just using RAFT and having a master & backup machines, which strikes me as much more tractable to program with than to fuss about every interaction.

Curious to others' thoughts. @pjd-nu, @okrieg, it would be really valuable to have someone with some more experience weigh in here.

naved001 commented 6 years ago

Feels good to close this one now.

CCI-MOC / hil

Serial console support is horribly fragile (was: Limit number of console-watching processes to 1) #417

Operation of the daemon

"Admin" operations

Regular user operations

HIL API server changes

project body parameter: why

Advantages