Infinite loop and latch contention

savaresejt commented 4 weeks ago

Development environment used

Z Open Editor version:
Editor Platform
- [ x ] Visual Studio Code
- [ ] Red Hat CodeReady Workspaces
- [ ] Eclipse Che
- [ ] Standalone Theia
Editor Platform Version:
Operating System (on which VS Code runs such as Windows 10 2004, otherwise name and version of platform such as OpenShift v4.3):
Java Version (when using VS Code or Theia, execute java -version and paste the details here):
[ ] Related to RSE API?
- RSE API Plugin version:
- Zowe CLI version: v2.15.0
- Node.js version: v18.20.4
Logs attached (see here how to get them): yes/no

Problem Description

Detailed steps for reproducing the problem:

First step

Observed behavior

We noticed that a user had over 15,000 dead address spaces. We contacted the user and saw a very large number of get requests coming from their zopen editor. We determined that the issue was from the editor looking for a nonexistent copybook on the LPAR. We created the missing copybook and watched the issue stop.

Afterwards we recreated the issue by removing the copybook and watching it go into an infinite loop again. This is related to this issue. https://github.com/IBM/zopeneditor-about/issues/445

This is a problem with zopen editor interacting with z/osmf, we explored the recommended system settings in the tuning blog; however we feel that if a copybook is misssing 15,000 dead address spaces should not be created. Maybe only one lookup would be appropriate behavior

-

Expected behavior

We would expect that if a lookup failed it would not retry without the user clicking or something like that.

-

phaumer commented 4 weeks ago

Did the ticket you opened with z/OSMF tech support not help? They pinged me for background and I pointed them to these issues and described the problem to them once again. I will send them an email to see if they can continue to help.

savaresejt commented 4 weeks ago

We believe there are two issues right now.

One is that the zopen editor client is making way too many requests. We don't believe it's good behavior if an engineer is sitting there idle with an open dataset that tens of thousands of requests would be going out and repeatedly failing to retrieve includes. Instead it should probably not retry at all, unless the developer hovers over it or something like that.

The second issue is that we don't believe z/osmf should be deadlocking the system when there are errors.

We opened a ticket with them and sent them logs for the second issue.

phaumer commented 4 weeks ago

It will stop downloading when you close the file.

If a file was not found it would stop looking for it again after it searched in all the property group locations. Can you provide log files where it is requested repeatedly?

(Note, there is a logging issue with local files where it shows "Looking for local or down" multiple times, but they will all show the same request id. We will fix that, but they are not repeated requests to a remote server.)

We have to download all the files or we will not be able to parse the COBOL program completely to show syntax errors, outline view etc.

savaresejt commented 4 weeks ago

Where does zopen editor store logs from outbound requests and where can I upload them? I will contact the engineer and get those to you.

When you say download all the files, do you mean the files in the dataset that the engineer is viewing and the includes, copybooks, etc. correct? Not the entire PDS?

What we are seeing in SDSF is the result of 16,000 requests to z/osmf from having one pdse member open, there are nowhere near that many includes in it.

phaumer commented 4 weeks ago

Z Open Editor has a log that can be switched to the DEBUG level to provide a detailed output for how it tries to resolve include files showing you all the outgoing requests and Not Found errors when they happen as well as how it then continues searching in other locations or stop searching. See details here: https://ibm.github.io/zopeneditor-about/Docs/locating_local_client_logs.html

Yes, we only search for the include files that are used by a program currently opened in the editor: the language server tries to parse the program, finds a copy or include statement and then asks the editor to fetch it. The editor then uses the search order as defined in the zapp file. Our integration tests run with the default 5 parallel requests and can resolve programs with 1000 (small) copybooks from MVS in under a minute.

As mentioned in the other issue we have settings to control parallel execution of these requests as each concurrent request will create a new address space, but once finished the space should be reused by the next request. If that does not happen then z/OSMF tech support needs to help.

phaumer commented 2 weeks ago

Turning this item into a bug as we found an issue with the listBeforeDownload setting that it would only run the list for data set members, but not the data sets themselves first.

TommyTechh commented 2 weeks ago

I'm just commenting to you let you guys know that this is also a problem we encounter.

Additionally, we also noticed when our users were editing the copybook names in their file it would start looking for the copybook before the user stopped typing. For example if the user wants to type "COPY DATACPY". When you finish typing copy if there is another word after COPY on another line it will start looking for that word as a copybook. Then afterwards it would start looking for a copybook called D, then DA, then DAT, then DATA. For every character that was written.

We've also encountered that it loops when it is not able to find a copybook. And as a result continiously create tasks until we close vscode.

We will see if listBeforeDownload and adjusting the parallel requests will help, but would using RSE also fix this issue?

phaumer commented 2 weeks ago

Thanks @TommyTechh. We are fixing the data set issue. Configuring the delay for requesting the file and deciding when the language should give up requesting the file we need to discuss internally first.

The listBeforeDownload setting was added mainly for z/OSMF. RSE API logs differently and a 404 is not much of a problem there.

IBM / zopeneditor-about