eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
424 stars 179 forks source link

[BUG] Saving xquery code through oXygen or eXide may lead to server 'crash' #4022

Open ahenket opened 3 years ago

ahenket commented 3 years ago

Describe the bug It is unclear what the exact circumstances are but when you hit Save on updated xquery code from oXygenXML or (less common for me) in eXide you may bring the server to a coma like state where it runs at 0.3/0.4% CPU but will not respond to anything otherwise. This does not happen on every Save but is most common on libraries with much reuse.

The result is that only a force quit will stop the server which in turn leads to reindex and reindex takes up to 24 hours on our larger databases. Both the cause and the consequence of reindex should really be fixed

Expected behavior No crash/outage should occur from simply hitting Save from oXygen (xmlrpc) or eXide. Reindex should preferably only be triggered when it's actually an index problem and if it happens it should not cause up a 1 day of downtime.

To Reproduce Update code in libraries with lots of reuse.

Screenshots If applicable, add screenshots to help explain your problem.

Context (please always complete the following information):

Additional context

20210831-0847-eXist-db-5.3-after-save-oxygen.txt

ahenket commented 3 years ago

Update:

mgraauw commented 3 years ago

Some more info. I am able to reproduce this on my local system. Yes, trouble seems to come from 3 xqm's where each one imports the other 2. Saving from either Oxygen or eXide will hang exist, but saving from Vscode will not. What I see in Monex: image

joewiz commented 3 years ago

@ahenket @mgraauw Would it be possible to prepare a zip file containing a minimal example of the modules that lead to this situation?

Other helpful details:

mgraauw commented 3 years ago

On my local db I do have fixes that allow me to save w/o the problems. zip is possible, but will take some figuring out (lots of interdependent code)

mgraauw commented 3 years ago

I have traced and solved the issue on my local machine. It happens in some cases where there is a triangular import: A imports module B and C B imports module A and C C imports module B and A Resolution in our case was to refactor the code to prevent this situation from happening.

I've tried to make a small example, did not succeed. Since the db hangs after most, but not all, saves, I suspect the condition is dependent on larger modules which take longer to compile, and then lock each other.

I can be reproduced: load our xar in some eXist 5.3.0: https://decor.nictiz.nl/exist/apps/public-repo-dev3/public/ART-2.1.36.xar These 3:

import each other. Change one (add space in comment), save. On our machines, not on every, but once in every few saves the db will hang. Same thing from eXide. Oxygen 23.1, eXide eXide version 3.0.0. Saving from Vscode 1.59.1 (with existdb-vscode v1.5.3) will not hang the db.

Problem looks solved for us, no need on our side for action, just documenting in case follow up from your side is wanted.

raducoravu commented 2 years ago

When using the "Save" action Oxygen uses the API "org.xmldb.api.base.Resource.setContent(Object)" to store content in a resource and then the API "toStore.getParentCollection().storeResource(toStore)" to store the resource in the parent collection. But as you can reproduce the same problem from eXide.... it does not seem a problem specific to Oxygen. Anyway if you give a precise set of steps to reproduce the problem and consider the problem stems from Oxygen I can try to reproduce it on my side and add some logging on the Oxygen side.

mgraauw commented 2 years ago

@raducoravu I do not think it's an Oxygen issue, nor that it can be solved from Oxygen

mgraauw commented 2 years ago

BTW, I wrote "Problem looks solved for us" in a previous comment, that is certainly not the case, this was a bit premature

domlatham commented 2 years ago

I also have this problem with my eXist 5.3.0 / Oyxgen 23.1 setup. I can also replicate it with Oxygen 22. I can add and remove the same space character from the same xquery file and it kills eXist maybe 1 out of 20 times. The script contains triangular references as documented above.

joewiz commented 2 years ago

@mgraauw @domlatham Could you confirm if you are encountering this error even with RESTXQ disabled? RESTXQ has a known issue with circular module imports (https://github.com/eXist-db/exist/issues/3448). In the original post @ahenket stated that he had thoroughly disabled RESTXQ on his system at the conf.xml level, and that the problem persists. As discussed just now on the Community Call, eXist doesn't parse the contents of modules when they're being saved when RESTXQ is disabled, so why circular module imports should contribute to this is a mystery. Is it possible the problem affects save operations on other resources too? Or is it truly limited to these modules involved in circular module imports?

domlatham commented 2 years ago

@joewiz Yes, I have had RESTXQ trigger disabled in the following way for a long time now:

<collection xmlns="http://exist-db.org/collection-config/1.0">
    <triggers>
        <!--<trigger class="org.exist.extensions.exquery.restxq.impl.RestXqTrigger"/>-->
    </triggers>
</collection>
mgraauw commented 2 years ago

I have/had it disabled on my local machine. I think @ahenket experienced problems with other saves as well, I had it (while testing) only on the triangular import xqm's, never on others.

domlatham commented 2 years ago

I have a strong sense that this issue is related to https://github.com/eXist-db/exist/issues/3917. For me they started at the same time and are equally sporadic. I often get validation errors for scripts saying that a dependency file is invalid. These then spontaneously resolve themselves on a save or a reload. @mgraauw have you experienced this too? @raducoravu what do you think?

mgraauw commented 2 years ago

No, I haven't experienced what's mentioned in 3917, though I do use Oxygen and Ox/compare often. Did you @ahenket ? I did notice as mentioned above that saving from eXide or Oxygen can crash eXist, but not when saving from Vscode.

raducoravu commented 2 years ago

@domlatham if somebody includes some simple precise steps I can try on my side to reproduce the problem with a stock Exist server installation I could try to investigate this a bit on my side.

ahenket commented 2 years ago

@raducoravu There are no exact steps that I know of. Today I made eXist-db hang itself when I installed a second package immediately after the first finished. I've managed to do that before. I normally wait 10-20 seconds between packages to let the background aching of queries finish to workaround that, but forgot this time. The issue is by no means specifically related to oXygenXML. It's just one of the ways to trigger the machine into a state where all threads are happily waiting for each other with no master thread to make a decision which goes first.

84000-admin commented 2 years ago

@raducoravu Unfortunately I can confirm that this is still occasionally a problem in 24.0, build 2021121518. I'm not working so much on those modules at the moment, so I can't be sure if it's less frequent, but it just happened to me now.

dariok commented 2 years ago

As mentioned in the last community call, this still is an issue in eXist 6.0.1

I just encountered it (again) when trying to store a file while some other store operation was still working; sadly, lock logging does not work, even though enabled, hence no new info can be added.

domlatham commented 2 years ago

@dariok, try Preferences > Data Sources > Connection Wizards > Create exist-db XML connection to create a new connection with the latest drivers fro xmlrpc. Since I did this the issue seems to be resolved (having suffered from this for a long time).

dariok commented 2 years ago

@domlatham Thank you, but here, I was not using oXygen but eXide.

This problem may occur any time there are concurrent store operations, independent from the where the store operation originated. During Monday’s call we concluded that this is very likely an issue with lock acquisition, which we want to dig further into – hence the remark about lock logging.

dariok commented 2 years ago

Managed to get a long trace in locks.log as two store operations hang. While attempting/acquiring/releasing read locks does work back an forth between these two, their last entries are

2022-08-01 00:32:13,754 TRACE - Attempt COLLECTION#1706452315579118(WRITE_LOCK) of /db by qtp767794003-40 at 1706452315584521
2022-08-01 00:32:13,754 TRACE - Attempt COLLECTION#1706452316044567(WRITE_LOCK) of /db by qtp767794003-30 at 1706452316049020
dariok commented 2 years ago

[The data shown previously were obtained when profiling was switched on; these data were collected after a restart]

2022-08-01 19:00:12,207 INFO  -
Acquired Locks
------------------------------------
/db/apps
        DOCUMENT
                INTENTION_READ  qtp1675471168-32 (count=1), qtp1675471168-33 (count=1), qtp1675471168-35 (count=1)
/db/apps/eXide
        DOCUMENT
                INTENTION_READ  qtp1675471168-32 (count=1), qtp1675471168-33 (count=1), qtp1675471168-35 (count=1)
/db/apps/eXide/modules/store.xq
        DOCUMENT
                READ_LOCK       qtp1675471168-32 (count=1), qtp1675471168-33 (count=1), qtp1675471168-35 (count=1)
/db
        DOCUMENT
                INTENTION_READ  qtp1675471168-32 (count=1), qtp1675471168-33 (count=1), qtp1675471168-35 (count=1)
/db/apps/eXide/modules
        DOCUMENT
                INTENTION_READ  qtp1675471168-32 (count=1), qtp1675471168-33 (count=1), qtp1675471168-35 (count=1)

Attempting Locks
------------------------------------
/db
        COLLECTION
                WRITE_LOCK      qtp1675471168-35
                WRITE_LOCK      qtp1675471168-33
                WRITE_LOCK      qtp1675471168-32

The output of jstack -l: lock-stack-long.txt

domlatham commented 1 year ago

This may help replicate the issue. When Oxygen is validating a file there's a small green square in the top right corner of the editor, next to the scrollbar. The tooltip for this says "Analysis completed: no problems found". When editing an xquery file it momentarily goes light-green and the tooltip says "in progress...".

To replicate the issue:

So I've been using this workaround for a while now and it's pretty much 100% safe. So the issue has become avoidable for me (if I don't absent-mindedly hit save too soon). But obviously it's a pretty critical error for using Oxygen with exist. I hope this helps to get it resolved.

84000-admin commented 2 months ago

I am finally on 6.2.0 and can confirm that this is still an issue.

The steps to replicate are as above:

StephanMa commented 2 months ago

I am very happy that this was mentioned. It’s so annoying all the time

adamretter commented 2 months ago

I spent some time @joewiz request to address this issue. Can anyone confirm if it still happens in eXist-db HEAD?

84000-admin commented 2 months ago

Thanks @adamretter. I could test eXist-db HEAD if I had a exist-installer.jar.

dizzzz commented 2 months ago

@84000-admin would a docker container work for you?