Open XiaofeiCao opened 4 weeks ago
Is it intermittent, or always fail on certain RP?
Is it intermittent, or always fail on certain RP?
It always fails for informatica
. I'll give it a try for other RPs.
Latest finding: When there's error in tsp file, the pipeline won't stuck, and throws error successfully: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3888615&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9&t=356bb04c-cb4a-5f04-82ca-d3b102917eba&l=118
Directly run tsp-client, also stuck:
pipeline definition: https://github.com/Azure/azure-sdk-for-java/blob/67f89cb9be55f4fcadc534d5dd6c867750ba8fa8/eng/mgmt/automation/generation.yml#L51-L54
Latest finding: Use macos-13 succeeded without blocking... https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3889023&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9&t=356bb04c-cb4a-5f04-82ca-d3b102917eba
You may change the vm if really need to (but maybe Windows instead of Mac)
You may change the vm if really need to (but maybe Windows instead of Mac)
Yeah, seems macos resource is limited: https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops&tabs=yaml
I saw windows is using Git Bash
for bash:
https://learn.microsoft.com/en-us/azure/devops/pipelines/scripts/cross-platform-scripting?view=azure-devops&tabs=yaml#consider-bash-or-pwsh
I'll try that.
Also stuck on windows bash: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3892604&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9&t=356bb04c-cb4a-5f04-82ca-d3b102917eba
Tried PowerShell, also stuck: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3892700&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9&t=1a5cc010-7735-550a-9d76-c0b745122dab
I don't understand. tsp-client
is run directly without python:
Let me try using tsp-client command used by sdkautomation..
stuck even with sdkautomation command... (tsp-client init --local-repo) https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3893256&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9&t=ffe5b61a-4918-59e6-2d77-95255068933d
Does it get stuck with every library or just some? Is there any issues reported when the debug level is set?
stuck even with sdkautomation command... (tsp-client init --local-repo) https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3893256&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9&t=ffe5b61a-4918-59e6-2d77-95255068933d
What's the difference of your experiment branch with the current sdk automation (I assume they are still fine)?
Does it get stuck with every library or just some?
For my pipeline, it's every library.
Is there any issues reported when the debug level is set?
No, the last output was:
The command was simple, e.g.:
https://github.com/Azure/azure-rest-api-specs/blob/7605afe88e3201dc25ce0881c2e49fe1b6bbdd54/specification/mongocluster/DocumentDB.MongoCluster.Management/tspconfig.yaml
stuck even with sdkautomation command... (tsp-client init --local-repo) https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3893256&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9&t=ffe5b61a-4918-59e6-2d77-95255068933d
What's the difference of your experiment branch with the current sdk automation (I assume they are still fine)?
Currently I'm not seeing major differences... We all use node 18.20.x: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3910442&view=logs&j=a8a7a537-82b0-583c-7971-bac70b9822ca&t=37e3947b-3cfb-5d36-86ba-0e22bb7dbc33&l=181
One thing I noticed is that sdk automation runs in AzureCLI. Though also tried AzureCLI with same result. https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3910442&view=logs&j=a8a7a537-82b0-583c-7971-bac70b9822ca&t=37e3947b-3cfb-5d36-86ba-0e22bb7dbc33&l=3 https://github.com/Azure/azure-rest-api-specs-pipeline/blob/master/.azure-pipelines/templates/RunSDKAutomation.yml#L10
So it just hangs at the compile function? Without any errors being returned? Also, I still need to understand, is this always happening in this pipeline? For every library?
So it just hangs at the compile function? Without any errors being returned?
Yes. No error's returned. It just hangs for 60 minutes and timeout.
Like this pipeline: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3914963&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9&t=356bb04c-cb4a-5f04-82ca-d3b102917eba
This hanging task is just calling tsp-client init
:
- bash: |
npx tsp-client init --tsp-config $(TSP_CONFIG) --debug
displayName: '[Experiment] run tsp-client directly'
condition: eq(variables.fromTypeSpec, true)
https://github.com/Azure/azure-sdk-for-java/compare/main...mgmt_directly_call_tsp-client
is this always happening in this pipeline? For every library?
Yes. It hangs for every library for this pipeline.
I'm adding @timotheeguerin to see if he has any ideas since we're failing in the typespec compile call. Maybe there's something we can do to further debug the compile step. Tim, here is the link to the compile function: https://github.com/Azure/azure-sdk-tools/blob/d471d1f370dcdc696d995eb2b41dd0ac4ef95fb3/tools/tsp-client/src/typespec.ts#L55
Likewise I tested the init command directly in powershell with the sphere library and had no issues. My node version is 20.11.0 and I'm on a windows 11. The command I ran: npx tsp-client init --tsp-config https://github.com/Azure/azure-rest-api-specs/blob/7a41d14c661171b4fffec5863c51fb70529ee1db/specification/sphere/Sphere.Management/tspconfig.yaml --debug
Since tsp-client is successfully running in the automation pipelines, it seems there's some new configuration in this pipeline that's causing the issue or we might be inputting some unexpected data into the tool. Could be worth a look to see if there's anything in the other pipeline configurations that could help resolve this issue. By the point we get to the compile call in tsp-client we've already finished up with all of the tsp project cloning, installing the emitter deps, etc. So at that point we're just waiting to get the compiled library back from the @typespec/compiler
@XiaofeiCao could you also share an example of the tspconfig.yaml url or path you're passing into the command?
@XiaofeiCao could you also share an example of the tspconfig.yaml url or path you're passing into the command?
You may also try your own url in my pipeline by clicking Run New
in my stuck pipeline and in variables
, replace TSP_CONFIG
with your own.
@catalinaperalta Another interesting finding is that the pipeline won't stuck in macOS vmImage..
Symptom
Java SDK has our own code generation pipeline to generate SDK from TypeSpec: https://dev.azure.com/azure-sdk/internal/_apps/hub/ms.vss-build-web.ci-designer-hub?pipelineId=2238&nonce=e6ID6FSUknWJb/xkwTih5Q%3D%3D&branch=main
Recently the pipeline stuck at the last output line and timed out after 60 minutes.
Failed Run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3839396&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9&t=356bb04c-cb4a-5f04-82ca-d3b102917eba
Last successful run was about two weeks ago: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3761145&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9&t=356bb04c-cb4a-5f04-82ca-d3b102917eba
Reproducing the bug
rerun the pipeline and it'll stuck: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3887788&view=results
Env
vmImage: ubuntu-20.04 python --version: Python 3.10.14 nodejs: 18.20.3
command:
npx tsp-client init --debug --tsp-config https://github.com/Azure/azure-rest-api-specs/blob/9df71d5a717e4ed5e6728e7e6ba2fead60f62243/specification/informatica/Informatica.DataManagement/tspconfig.yaml
When run locally on my machine, there's no stuck. SDK Automation seems fine too.. https://dev.azure.com/azure-sdk/internal/_build/results?buildId=3839388&view=logs&j=a8a7a537-82b0-583c-7971-bac70b9822ca&t=37e3947b-3cfb-5d36-86ba-0e22bb7dbc33&l=1222
Expected behavior
We know that this is our own pipeline, though just wondering how the stuck occurred from tsp-client perspective. Would like some insight on the issue. Thanks!