Open ahenket opened 3 years ago
Update:
Some more info. I am able to reproduce this on my local system. Yes, trouble seems to come from 3 xqm's where each one imports the other 2. Saving from either Oxygen or eXide will hang exist, but saving from Vscode will not. What I see in Monex:
@ahenket @mgraauw Would it be possible to prepare a zip file containing a minimal example of the modules that lead to this situation?
Other helpful details:
On my local db I do have fixes that allow me to save w/o the problems. zip is possible, but will take some figuring out (lots of interdependent code)
I have traced and solved the issue on my local machine. It happens in some cases where there is a triangular import: A imports module B and C B imports module A and C C imports module B and A Resolution in our case was to refactor the code to prevent this situation from happening.
I've tried to make a small example, did not succeed. Since the db hangs after most, but not all, saves, I suspect the condition is dependent on larger modules which take longer to compile, and then lock each other.
I can be reproduced: load our xar in some eXist 5.3.0: https://decor.nictiz.nl/exist/apps/public-repo-dev3/public/ART-2.1.36.xar These 3:
import each other. Change one (add space in comment), save. On our machines, not on every, but once in every few saves the db will hang. Same thing from eXide. Oxygen 23.1, eXide eXide version 3.0.0. Saving from Vscode 1.59.1 (with existdb-vscode v1.5.3) will not hang the db.
Problem looks solved for us, no need on our side for action, just documenting in case follow up from your side is wanted.
When using the "Save" action Oxygen uses the API "org.xmldb.api.base.Resource.setContent(Object)" to store content in a resource and then the API "toStore.getParentCollection().storeResource(toStore)" to store the resource in the parent collection. But as you can reproduce the same problem from eXide.... it does not seem a problem specific to Oxygen. Anyway if you give a precise set of steps to reproduce the problem and consider the problem stems from Oxygen I can try to reproduce it on my side and add some logging on the Oxygen side.
@raducoravu I do not think it's an Oxygen issue, nor that it can be solved from Oxygen
BTW, I wrote "Problem looks solved for us" in a previous comment, that is certainly not the case, this was a bit premature
I also have this problem with my eXist 5.3.0 / Oyxgen 23.1 setup. I can also replicate it with Oxygen 22. I can add and remove the same space character from the same xquery file and it kills eXist maybe 1 out of 20 times. The script contains triangular references as documented above.
@mgraauw @domlatham Could you confirm if you are encountering this error even with RESTXQ disabled? RESTXQ has a known issue with circular module imports (https://github.com/eXist-db/exist/issues/3448). In the original post @ahenket stated that he had thoroughly disabled RESTXQ on his system at the conf.xml level, and that the problem persists. As discussed just now on the Community Call, eXist doesn't parse the contents of modules when they're being saved when RESTXQ is disabled, so why circular module imports should contribute to this is a mystery. Is it possible the problem affects save operations on other resources too? Or is it truly limited to these modules involved in circular module imports?
@joewiz Yes, I have had RESTXQ trigger disabled in the following way for a long time now:
<collection xmlns="http://exist-db.org/collection-config/1.0">
<triggers>
<!--<trigger class="org.exist.extensions.exquery.restxq.impl.RestXqTrigger"/>-->
</triggers>
</collection>
I have/had it disabled on my local machine. I think @ahenket experienced problems with other saves as well, I had it (while testing) only on the triangular import xqm's, never on others.
I have a strong sense that this issue is related to https://github.com/eXist-db/exist/issues/3917. For me they started at the same time and are equally sporadic. I often get validation errors for scripts saying that a dependency file is invalid. These then spontaneously resolve themselves on a save or a reload. @mgraauw have you experienced this too? @raducoravu what do you think?
No, I haven't experienced what's mentioned in 3917, though I do use Oxygen and Ox/compare often. Did you @ahenket ? I did notice as mentioned above that saving from eXide or Oxygen can crash eXist, but not when saving from Vscode.
@domlatham if somebody includes some simple precise steps I can try on my side to reproduce the problem with a stock Exist server installation I could try to investigate this a bit on my side.
@raducoravu There are no exact steps that I know of. Today I made eXist-db hang itself when I installed a second package immediately after the first finished. I've managed to do that before. I normally wait 10-20 seconds between packages to let the background aching of queries finish to workaround that, but forgot this time. The issue is by no means specifically related to oXygenXML. It's just one of the ways to trigger the machine into a state where all threads are happily waiting for each other with no master thread to make a decision which goes first.
@raducoravu Unfortunately I can confirm that this is still occasionally a problem in 24.0, build 2021121518. I'm not working so much on those modules at the moment, so I can't be sure if it's less frequent, but it just happened to me now.
As mentioned in the last community call, this still is an issue in eXist 6.0.1
I just encountered it (again) when trying to store a file while some other store operation was still working; sadly, lock logging does not work, even though enabled, hence no new info can be added.
@dariok, try Preferences > Data Sources > Connection Wizards > Create exist-db XML connection to create a new connection with the latest drivers fro xmlrpc. Since I did this the issue seems to be resolved (having suffered from this for a long time).
@domlatham Thank you, but here, I was not using oXygen but eXide.
This problem may occur any time there are concurrent store operations, independent from the where the store operation originated. During Monday’s call we concluded that this is very likely an issue with lock acquisition, which we want to dig further into – hence the remark about lock logging.
Managed to get a long trace in locks.log
as two store operations hang. While attempting/acquiring/releasing read locks does work back an forth between these two, their last entries are
2022-08-01 00:32:13,754 TRACE - Attempt COLLECTION#1706452315579118(WRITE_LOCK) of /db by qtp767794003-40 at 1706452315584521
2022-08-01 00:32:13,754 TRACE - Attempt COLLECTION#1706452316044567(WRITE_LOCK) of /db by qtp767794003-30 at 1706452316049020
[The data shown previously were obtained when profiling was switched on; these data were collected after a restart]
2022-08-01 19:00:12,207 INFO -
Acquired Locks
------------------------------------
/db/apps
DOCUMENT
INTENTION_READ qtp1675471168-32 (count=1), qtp1675471168-33 (count=1), qtp1675471168-35 (count=1)
/db/apps/eXide
DOCUMENT
INTENTION_READ qtp1675471168-32 (count=1), qtp1675471168-33 (count=1), qtp1675471168-35 (count=1)
/db/apps/eXide/modules/store.xq
DOCUMENT
READ_LOCK qtp1675471168-32 (count=1), qtp1675471168-33 (count=1), qtp1675471168-35 (count=1)
/db
DOCUMENT
INTENTION_READ qtp1675471168-32 (count=1), qtp1675471168-33 (count=1), qtp1675471168-35 (count=1)
/db/apps/eXide/modules
DOCUMENT
INTENTION_READ qtp1675471168-32 (count=1), qtp1675471168-33 (count=1), qtp1675471168-35 (count=1)
Attempting Locks
------------------------------------
/db
COLLECTION
WRITE_LOCK qtp1675471168-35
WRITE_LOCK qtp1675471168-33
WRITE_LOCK qtp1675471168-32
The output of jstack -l
:
lock-stack-long.txt
This may help replicate the issue. When Oxygen is validating a file there's a small green square in the top right corner of the editor, next to the scrollbar. The tooltip for this says "Analysis completed: no problems found". When editing an xquery file it momentarily goes light-green and the tooltip says "in progress...".
To replicate the issue:
So I've been using this workaround for a while now and it's pretty much 100% safe. So the issue has become avoidable for me (if I don't absent-mindedly hit save too soon). But obviously it's a pretty critical error for using Oxygen with exist. I hope this helps to get it resolved.
I am finally on 6.2.0 and can confirm that this is still an issue.
The steps to replicate are as above:
I am very happy that this was mentioned. It’s so annoying all the time
I spent some time @joewiz request to address this issue. Can anyone confirm if it still happens in eXist-db HEAD?
Thanks @adamretter. I could test eXist-db HEAD if I had a exist-installer.jar.
@84000-admin would a docker container work for you?
Describe the bug It is unclear what the exact circumstances are but when you hit Save on updated xquery code from oXygenXML or (less common for me) in eXide you may bring the server to a coma like state where it runs at 0.3/0.4% CPU but will not respond to anything otherwise. This does not happen on every Save but is most common on libraries with much reuse.
The result is that only a force quit will stop the server which in turn leads to reindex and reindex takes up to 24 hours on our larger databases. Both the cause and the consequence of reindex should really be fixed
Expected behavior No crash/outage should occur from simply hitting Save from oXygen (xmlrpc) or eXide. Reindex should preferably only be triggered when it's actually an index problem and if it happens it should not cause up a 1 day of downtime.
To Reproduce Update code in libraries with lots of reuse.
Screenshots If applicable, add screenshots to help explain your problem.
Context (please always complete the following information):
Additional context
20210831-0847-eXist-db-5.3-after-save-oxygen.txt