dotnet / dnceng

.NET Engineering Services
MIT License
25 stars 18 forks source link

Customer request to offline machines: PERFTIGER147 and PERFTIGER119 #3064

Closed dkurepa closed 2 months ago

dkurepa commented 3 months ago

We had a customer request to offline PERFTIGER147 and PERFTIGER119 https://teams.microsoft.com/l/message/19:afba3d1545dd45d7b79f34c1821f6055@thread.skype/1718163669497?tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47&groupId=4d73664c-9f2f-450d-82a5-c2f02756606d&parentMessageId=1718163669497&teamName=.NET%20Core%20Eng%20Services%20Partners&channelName=First%20Responders&createdTime=1718163669497. Helix Api is currently having some trouble tho, and is not able to do this

garath commented 3 months ago

The Helix API being broken seems... bad. What's the follow-up here?

riarenas commented 3 months ago

This is lacking details. What operation is failing, and how?

dkurepa commented 3 months ago

Sorry I didn't put details here, they're in a teams chat

but basically I'm getting

Microsoft.DotNet.Helix.Client.RestApiException`1[Microsoft.DotNet.Helix.Client.Models.ApiError]: The response contained an invalid status code 500 Internal Server Error

Body: {"Message":"An error occured.","ActivityId":"b9a4c32fe29f5e3d252e9daf9dae67c4"}
   at Microsoft.DotNet.Helix.Client.Machine.OnChangeStateFailed(Request req, Response res) in /_/src/Microsoft.DotNet.Helix/Client/CSharp/generated-code/Machine.cs:line 124
   at Microsoft.DotNet.Helix.Client.Machine.ChangeStateAsync(MachineStateChangeRequest body, String machineName, String queueId, CancellationToken cancellationToken) in /_/src/Microsoft.DotNet.Helix/Client/CSharp/generated-code/Machine.cs:line 95
   at Microsoft.Internal.Helix.Machines.OsobCli.Commands.ChangeMachineState.ChangeMachineStateCommand.ExecuteAsync(ChangeMachineStateOptions options) in C:\Users\dkurepa\source\repos\dotnet-helix-machines\tools\OsobCli\Commands\ChangeMachineState\ChangeMachineStateCommand.cs:line 59
   at Microsoft.Internal.Helix.Machines.OsobCli.Program.Main(String[] args) in C:\Users\dkurepa\source\repos\dotnet-helix-machines\tools\OsobCli\Program.cs:line 40

when running osob-cli

riarenas commented 3 months ago

I successfully ran:

./osob-cli.cmd change-machine-state --queue windows.11.amd64.tiger.perf -r "Taken down for perf team investigations, https://github.com/dotnet/dnceng/issues/3064" --production -d -m PERFTIGER119 --helix-api-token "my token" Machine PERFTIGER119 in windows.11.amd64.tiger.perf has been disabled

and

./osob-cli.cmd change-machine-state --queue ubuntu.2204.amd64.tiger.perf -r "Taken down for perf team investigations, https://github.com/dotnet/dnceng/issues/3064" --production -d -m PERFTIGER147 --helix-api-token "my token" Machine PERFTIGER147 in ubuntu.2204.amd64.tiger.perf has been disabled

So the machines are now disabled and I think the API is fine.

I think you were hitting a case sensitivity bug somewhere between the CLI and the API, as I'm able to repro the issue when running

./osob-cli.cmd change-machine-state --queue windows.11.amd64.tiger.perf -r "Taken down for perf team investigations, https://github.com/dotnet/dnceng/issues/3064" --production -d -m perftiger119 --helix-api-token "my token"

dougbu commented 3 months ago

Reenabled PERFTIGER147. Leaving this issue open until PERFTIGER119 is online as well…

dougbu commented 3 months ago

Checked in FR channel w/ Parker this evening. No response yet (as expected)

ilyas1974 commented 2 months ago

Systems are back online. Closing issue.