eclipse-archived / codewind

The official repository of the Eclipse Codewind project
https://codewind.dev
Eclipse Public License 2.0
114 stars 44 forks source link

Projects stuck in build queued state even though nothing else is building #2075

Closed eharris369 closed 4 years ago

eharris369 commented 4 years ago

I somehow got into a state where a project is stuck in build queued state even though nothing else is building. I tried adding another project and it got stuck in build queued state as well:

BuildQueuedImage

Here is the full pfe log file:

logs.txt

eharris369 commented 4 years ago

I tried deleting all of my projects and creating a new Open Liberty project. It flipped between Building: Creating image and Build Failed: Creating image for ... failed a few times and then finally ended up stuck in Build Queued state. There are no other projects. Here is the pfe log file just for this: logs.txt

eharris369 commented 4 years ago

Deleted all projects again and tried a couple of other project types and they appear to be ok (python, spring, go).

eharris369 commented 4 years ago

Spoke too soon - the Spring project got stuck in build queued state even though it got all the way to running state first: SpringQueued

ssh24 commented 4 years ago

@eharris369 What are the specific reproduction steps for this? Does it happen upon a new PFE build with existing projects or does it happen on creation of any new project or happen on a update of an existing project which is up?

ssh24 commented 4 years ago

/assign /pipeline in progress

eharris369 commented 4 years ago

I am not 100% sure how I got there. I think I restarted Eclipse and already had a 2 projects there (mpA1 and node1) and then I tried to add an Open Liberty project and it got stuck in build queued state.

ssh24 commented 4 years ago

Is this is only happening on Eclipse? And when you restarted eclipse - was PFE up and were those projects already running?

If you could replicate the issue again, could you please upload the logs with trace mode enabled.

eharris369 commented 4 years ago

I can turn on trace mode and then if it happens again upload the logs.

eharris369 commented 4 years ago

Just reproduced it - wasn't that hard. Created a node project then an appsody microprofile project and then an open liberty project one after the other. Did not need to restart eclipse. Here is the log with trace on: logs.txt

ssh24 commented 4 years ago

What does the build status on those project say on the UI?

eharris369 commented 4 years ago

It looks like this:

image

I just tried the same thing on MAC and it did not have the problem so maybe Windows only?

ssh24 commented 4 years ago

I have noticed on the logs the running build queue looks like:

Running Builds queue: ["938ba370-487f-11ea-9c19-6b6b88d11df1","938ba370-487f-11ea-9c19-6b6b88d11df1","938ba370-487f-11ea-9c19-6b6b88d11df1"]

That is very weird to me. Something is kicking off 3 builds together and blocking the queue. I am not sure if portal does something repeatedly that calls the create function multiple time in windows.

Yes I can't reproduce on my machine (Mac) either

ssh24 commented 4 years ago

This part of the log is even more interesting:

[06/02/20 01:28:32 mpA1] [TRACE] [Container] [INFO] Downloaded from central: https://repo.maven.apache.org/maven2/io/openliberty/features/microProfile-3.0/19.0.0.8/microProfile-3.0-19.0.0.8.esa (18 kB at 50 kB/s)
[Container] [INFO] Downloading from central: https://repo.maven.apache.org/maven2/com/ibm/websphere/appserver/spi/com.ibm.websphere.appserver.spi.containerServices/3.0.31/com.ibm.websphere.appserver.spi.containerServices-3.0.31.jar

[06/02/20 01:28:32 mpA1] [TRACE] [Container] [INFO] Downloaded from central: https://repo.maven.apache.org/maven2/io/openliberty/features/com.ibm.websphere.appserver.appLifecycle-1.0/19.0.0.8/com.ibm.websphere.appserver.appLifecycle-1.0-19.0.0.8.esa (9.3 kB at 25 kB/s)
[Container] [INFO] Downloading from central: https://repo.maven.apache.org/maven2/io/openliberty/features/com.ibm.websphere.appserver.javax.cdi-1.2/19.0.0.8/com.ibm.websphere.appserver.javax.cdi-1.2-19.0.0.8.esa

[06/02/20 01:28:32 mpA1] [TRACE] [Container] [INFO] Downloaded from central: https://repo.maven.apache.org/maven2/com/ibm/websphere/appserver/spi/com.ibm.websphere.appserver.spi.containerServices/3.0.31/com.ibm.websphere.appserver.spi.containerServices-3.0.31.jar (59 kB at 134 kB/s)

[06/02/20 01:28:32 mpA1] [TRACE] [Container] [INFO] Downloading from central: https://repo.maven.apache.org/maven2/io/openliberty/features/com.ibm.websphere.appserver.javax.el-3.0/19.0.0.8/com.ibm.websphere.appserver.javax.el-3.0-19.0.0.8.esa

[06/02/20 01:28:32 User.js] [DEBUG] Project 938ba370-487f-11ea-9c19-6b6b88d11df1 file change list received on 1581017194453. chunk number 1 with total chunk 1 for this timestamp.
[06/02/20 01:28:32 /portal/modules/FileWatcher.js] [INFO] 938ba370-487f-11ea-9c19-6b6b88d11df1 update start time: 1580952512798
[06/02/20 01:28:32 /portal/routes/projects/remoteBind.route.js] [INFO] Total time to sync project node1 to build container is 1580952512.798 seconds
[06/02/20 01:28:32 /portal/modules/FileWatcher.js] [DEBUG] Portal's event listener: projectStatusChanged
[06/02/20 01:28:32 /portal/modules/FileWatcher.js] [DEBUG] Portal's event listener: {"projectID":"938ba370-487f-11ea-9c19-6b6b88d11df1","buildStatus":"inProgress"}
[06/02/20 01:28:32 /portal/modules/FileWatcher.js] [DEBUG] projectStatusChanged: inProgress (project 938ba370-487f-11ea-9c19-6b6b88d11df1)
[06/02/20 01:28:32 /portal/modules/FileWatcher.js] [TRACE] projectStatusChanged: inProgress (project: {
  "projectID": "938ba370-487f-11ea-9c19-6b6b88d11df1",
  "buildStatus": "inProgress"
})
[06/02/20 01:28:32 Turbine] [DEBUG] Found at least one build in the queue
[06/02/20 01:28:32 Turbine] [DEBUG] Space available to trigger the next build
[06/02/20 01:28:32 Turbine] [DEBUG] Metadata for next build: {"operation":{"operationId":"f0fca9ba5e79c494ac6c80f7a1301a3e","type":"update","projectInfo":{"projectID":"938ba370-487f-11ea-9c19-6b6b88d11df1","projectType":"nodejs","location":"/codewind-workspace/node1","autoBuildEnabled":true,"startMode":"run","appPorts":["3000"],"language":"nodejs","debugPort":"9229","ignoredPaths":["*/.idea/*","*.iml","/.project","/load-test*","*/*.swp","*/*.swx","*/.gitignore","*/node_modules*","*/4913","*/.git/*","*/.DS_Store","*/.dockerignore","*/*~","/.settings","/run-dev","/run-debug","/package-lock.json*","/nodejs_restclient.log","/nodejs_dc.log","/manifest.yml","/idt.js","/cli-config.yml","/README.md","/Jenkinsfile","/.cfignore"],"statusPingTimeout":30,"isHttps":false,"buildRequest":false,"sentProjectInfo":true}},"handler":{"requiredFiles":["/Dockerfile","/package.json"],"supportedType":"nodejs","detectChangeByExtension":false,"defaultIgnoredPath":["/.project","/run-dev","/run-debug","/package-lock.json*","/nodejs_restclient.log","/nodejs_dc.log","/manifest.yml","/idt.js","/cli-config.yml","/README.md","/Jenkinsfile","/.cfignore","/load-test*","*/node_modules*","*/.git/*","*/.DS_Store","*/*.swp","*/*.swx","*/4913","*/.dockerignore","*/.gitignore","*/*~","/.settings","/chart"]}}
[06/02/20 01:28:32 Turbine] [TRACE] Key translated. Returning message: Build started
[06/02/20 01:28:32 Turbine] [TRACE] Emitting project ranks
[06/02/20 01:28:32 Turbine] [DEBUG] Running Builds queue: ["938ba370-487f-11ea-9c19-6b6b88d11df1","938ba370-487f-11ea-9c19-6b6b88d11df1"]
[06/02/20 01:28:32 Turbine] [DEBUG] Builds in progress: 2
[06/02/20 01:28:32 /portal/modules/FileWatcher.js] [DEBUG] Received filewatcher module response: { statusCode: 202 }
[06/02/20 01:28:32 /portal/modules/FileWatcher.js] [DEBUG] Portal's event listener: projectStatusChanged
[06/02/20 01:28:32 /portal/modules/FileWatcher.js] [DEBUG] Portal's event listener: {"projectID":"938ba370-487f-11ea-9c19-6b6b88d11df1","buildStatus":"inProgress","detailedBuildStatus":"Build started"}
[06/02/20 01:28:32 /portal/modules/FileWatcher.js] [DEBUG] projectStatusChanged: inProgress (project 938ba370-487f-11ea-9c19-6b6b88d11df1)
[06/02/20 01:28:32 /portal/modules/FileWatcher.js] [TRACE] projectStatusChanged: inProgress (project: {
  "projectID": "938ba370-487f-11ea-9c19-6b6b88d11df1",
  "buildStatus": "inProgress",
  "detailedBuildStatus": "Build started"
})
[06/02/20 01:28:32 node1] [INFO] Updating container for nodejs project /codewind-workspace/node1
[06/02/20 01:28:32 node1] [INFO] The log name for 938ba370-487f-11ea-9c19-6b6b88d11df1 is cw-938ba370-487f-11ea-9c19-6b6b88d11df1-73a4443b8a5394c30156971bfa6e197e751cfdb7
[06/02/20 01:28:32 node1] [INFO] Container name: cw-node1-938ba370-487f-11ea-9c19-6b6b88d11df1
[06/02/20 01:28:32 node1] [INFO] Log name: cw-938ba370-487f-11ea-9c19-6b6b88d11df1-73a4443b8a5394c30156971bfa6e197e751cfdb7
[06/02/20 01:28:32 node1] [INFO] forceAction: undefined
[06/02/20 01:28:32 Turbine] [INFO] 938ba370-487f-11ea-9c19-6b6b88d11df1: Project 938ba370-487f-11ea-9c19-6b6b88d11df1 file changed
[06/02/20 01:28:32 Turbine] [TRACE] The projectInfo has been updated for imagePushRegistry: {"projectID":"938ba370-487f-11ea-9c19-6b6b88d11df1","projectType":"nodejs","location":"/codewind-workspace/node1","autoBuildEnabled":true,"startMode":"run","appPorts":["3000"],"language":"nodejs","debugPort":"9229","ignoredPaths":["*/.idea/*","*.iml","/.project","/load-test*","*/*.swp","*/*.swx","*/.gitignore","*/node_modules*","*/4913","*/.git/*","*/.DS_Store","*/.dockerignore","*/*~","/.settings","/run-dev","/run-debug","/package-lock.json*","/nodejs_restclient.log","/nodejs_dc.log","/manifest.yml","/idt.js","/cli-config.yml","/README.md","/Jenkinsfile","/.cfignore"],"statusPingTimeout":30,"isHttps":false,"buildRequest":false,"sentProjectInfo":true}
[06/02/20 01:28:32 node1] [INFO] executeBuildScript with args:  {/file-watcher/scripts/nodejs-container.sh /codewind-workspace/node1 /codewind-workspace 938ba370-487f-11ea-9c19-6b6b88d11df1 update cw-node1-938ba370-487f-11ea-9c19-6b6b88d11df1 true cw-938ba370-487f-11ea-9c19-6b6b88d11df1-73a4443b8a5394c30156971bfa6e197e751cfdb7 run 9229 NONE node1-938ba370-487f-11ea-9c19-6b6b88d11df1 undefined}, In K8 environment: undefined
[06/02/20 01:28:32 node1] [INFO] Now tracking running process 490 for project 938ba370-487f-11ea-9c19-6b6b88d11df1
[06/02/20 01:28:32 node1] [INFO] Detected other file changes, Codewind will build the project
[06/02/20 01:28:32 node1] [INFO] File change detected. Project will re-build.
[06/02/20 01:28:32 node1] [INFO] Changed Files: [ >.project ]
[06/02/20 01:28:32 node1] [TRACE] Setting new changed files in the changedFilesMap... 
[06/02/20 01:28:32 node1] [TRACE] Received last chunk for timestamp 1581017194453 , checking if still waiting for other chunks for other timestamps... 
[06/02/20 01:28:32 node1] [INFO] Received all chunks.
[06/02/20 01:28:32 node1] [INFO] Auto build is enabled, build proceeding...
[06/02/20 01:28:32 node1] [INFO] Build state changed for project: 938ba370-487f-11ea-9c19-6b6b88d11df1 from: success, to: inProgress
[06/02/20 01:28:32 node1] [INFO] Emitting event 
 message: projectStatusChanged
 data: {
  "projectID": "938ba370-487f-11ea-9c19-6b6b88d11df1",
  "buildStatus": "inProgress"
}

So it is doing the dockerbuild of a project and out of nowehere it kicks of another build request. Now the build queue, here will add it to the queue if it has space which is what it does. What I am not getting is what will kick off another build while dockerbuild is happening

eharris369 commented 4 years ago

Here is the filewatcher log in case that helps: filewatcherd-1.log

ssh24 commented 4 years ago

I will try it on my windows VM. It shouldn't matter eclipse vs VSCode right?

eharris369 commented 4 years ago

Theoretically it should not matter.

ssh24 commented 4 years ago

I tried on Win with both VSCode and Eclipse and they are working on:

  1. creation of new projects
  2. installation of new PFE with existing projects

Am I missing any specific configs?

eharris369 commented 4 years ago

Try creating 4 or 5 open liberty projects in a row (Codewind - not Appsody) using Eclipse. I reinstalled codewind today after a docker system prune -a -f --volumes and can still reproduce. I also tried reproducing on a Windows VM I have for doing accessibility testing but the scenario seemed to crash docker (it is a very slow machine).

ssh24 commented 4 years ago

This only happens for OL projects? I tried it with one - same steps as you mentioned with some more: 1 node, 1 lib, 1 spr, 1 OL and 1 appsody microprofile

eharris369 commented 4 years ago

I found it a lot easier to reproduce with several OL projects.

eharris369 commented 4 years ago

This could be related to: https://github.com/eclipse/codewind/issues/2072 as I see [INFO][39m Changed Files: [ >.classpath ] several times in the logs. Also [INFO][39m Changed Files: [ >.project ] though I thought .project was already ignored.

ssh24 commented 4 years ago

I think so too. And as we see those changed files, it goes to adding to the queue where the same project is already building in progress. I have talked to Erin, and I am adding a guard against the running build queue where no one project can be in the running build queue at the same time.

ssh24 commented 4 years ago

From our investigation last week, we found the issue: on eclipse, the .project file triggers build once a maven project is building. this now keeps adding the project to the queue because build queue not works for update. As a result, the running build queue gets clogged with the same project id (first issue). I have a PR open to fix that so that all running build is unique at all times.

However, there is a second issue - .project file should be ignored along with some other eclipse specific files here https://github.com/eclipse/codewind/issues/2072. I have seen other projects which would trigger a build when .project changed. We need to fix https://github.com/eclipse/codewind/issues/2072 for the second half of this issue to be fixed.

rajivnathan commented 4 years ago

@eharris369 can you please verify when you have a chance?

ssh24 commented 4 years ago

The only thing Erin can verify is that running build queues are unique. Other than that, the issue still persists as I described to you yesterday @rajivnathan until #2072 is fixed. I think we can still keep this open but prioritize #2072 and tag that as hot

rajivnathan commented 4 years ago

Putting this back in progress because it sounds like @johnmcollier saw this problem occur with the fix in. @ssh24 to investigate further

johnmcollier commented 4 years ago

Saw this on Che with the generic Docker project "mygo"

Sidecar logs (warning they're big): sidecar_filewatcher.log PFE Logs (warning, they're also big): pfe_logs.txt

rajivnathan commented 4 years ago

@eharris369 @johnmcollier think this can now be verified along with https://github.com/eclipse/codewind/issues/2072

ssh24 commented 4 years ago

@eharris369 Looks like #2072 went in. I just verified on eclipse and it worked for me (5 fresh OL projects). Could you please verify?

ssh24 commented 4 years ago

Ah, Rajiv beat me to it haha

eharris369 commented 4 years ago

I tried various things and I can't get it to fail anymore so closing.