Open laudrain opened 5 years ago
Thanks for the report @laudrain. @danielweck does it ring any bell? Would you have time to look into this?
I see an executable hierarchy: prowin32.exe
> java.exe
> node.exe
. What does that represent?
The official Ace 1.1.1
release is based on the master
branch, which uses old versions of package dependencies (such as the Winston logger). It would be interesting to see if the more up-to-date Ace "core" lib stack used by AceApp (i.e. ace-gui
repository) causes the same problems. See this Pull Request:
https://github.com/daisy/ace/pull/236/files
Unfortunately, at the moment there is no unofficial build of the Ace "core" (more specifically, Ace CLI) which ships in AceApp. This would allow advanced users / early adopters to run tests and compare.
Well, in addition AceApp 1.0.0-rc.1
, there are also "nightly" builds (compiled directly from the master
branch), so this could in principle be used, but in practice running the Ace CLI from this bundle is really convoluted:
https://github.com/daisy/ace-gui/releases
So, I think "we" (i.e. the Ace dev team at large) should put into place a "next" NPM release tag of Ace "core" (e.g. 1.2.0-alpha.1
, based on the aforementioned Pull Request) alongside the "latest" NPM tag (i.e. 1.1.1
, based off the master
branch), so that people can try the updated Ace CLI.
Romain, what do you think?
So, I think "we" (i.e. the Ace dev team at large) should put into place a "next" NPM release tag of Ace "core" (e.g.
1.2.0-alpha.1
, based on the aforementioned Pull Request) alongside the "latest" NPM tag (i.e.1.1.1
, based off themaster
branch), so that people can try the updated Ace CLI.
Yes, sounds good to me. "Beta", even.
...there are recurrent locks of the node.exe process, at least once a day. Our team has to kill the process... Attached is the screenshot of node.exe process stopped.
Interesting. So the process just waits indefinitely ... sounds like the culprit could be Winston (the file / console logger) awaiting the termination of a write stream. Perhaps this is caused by concurrent file access. Windows 2008 R2
is a pretty specific environment, but generally-speaking (from development experience) the Windows filesystem(s) can cause such problems.
Once again, note that the updated Ace branch (used by Ace App / ace-gui
) utilizes an updated strategy to wait for file stream closure (in order to work around a Winston bug):
...perhaps this could improve things.
Related (identical?) issue: https://github.com/daisy/ace/issues/284
The error massage in #284 was produced by an interactive attempt to run Ace from a terminal command line. It showed the that the ace.log file was locked by the node.exe process still running (locked). But this doesn't explain why this node.exe process was locked.
@danielweck your proposal for improvement means that we should wait for a new release?
It showed the that the ace.log file was locked by the node.exe process still running (locked). But this doesn't explain why this node.exe process was locked.
Concurrent access to the log file can explain why the Winston logger fails to flush its internal data (string messages), in order to subsequently allow the CLI app to terminate gracefully.
Another cause (instead of concurrent file access) might be hiccups in the Windows filesystem (although in your case, I suspect NTFS? ... or is it a shared network drive, or a virtual machine of sorts?).
@danielweck your proposal for improvement means that we should wait for a new release?
The current design of Ace "core" (CLI, as well as the modules used in Ace App / ace-gui
) is not compatible with concurrent launches / parallel accessibility evaluations. But I also think that sequential launches can be affected by edge-case bugs related to the logging mechanism (for example, file handle released too soon, or as per your reports: not released at all / indefinite wait).
In fact, I disabled file logging in the Windows unit-tests, because of similar problems! https://github.com/daisy/ace/blob/72047a2423c17bf05f6621d418e1dc0668eb0e83/packages/ace-logger/src/index.js#L42
So, I think that the top priority should be to refactor the file logger / Winston integration. I think a LRU (Least Recently Used) "log rotate" strategy would work well (to avoid the excessive proliferation of log files in a given output folder). Each Ace launch would "own" its dedicated log file.
...and yes, I am taking on this development task :)
@danielweck as we are still experiencing the issue, i have to make it clear that the main problem is node.js frizzing! it happens regularly (circa once per day) and the process has to be killed and relaunched, files to be put back in the queue.
The screenshot in the first comment of this issue illustrated exactly what happens:
@danielweck the other issue#284 about ace.log file locked comes from wrong process killing when tentatively trying to stop this node.js freeze. I'll close #284.
any idea about this node.js frizzing ?
Hello Luc, could you please try npm install @daisy/ace@next
(which is currently equivalent to npm install @daisy/ace@1.2.0-beta.2
).
I imagine you install Ace globally (using the --global
/ -g
NPM command line argument)? If so, you might need to uninstall your current version 1.1.1
(npm -g uninstall @daisy/ace
and npm -g uninstall @daisy/ace-cli
).
Version 1.2.0-beta.2
corresponds to this Pull Request: https://github.com/daisy/ace/pull/236
the process has to be killed and relaunched, files to be put back in the queue.
Could you please explain how you use Ace? Do you launch a Node process per Ace CLI evaluation, or do you have a smarter wrapper (NodeJS script) that launches multiple evaluations in a sequence inside the same process?
@danielweck Ace is launched through a batch command invoking the CLI version a)of Ace (see in the screenshot the highlighted cmd.exe).
So, this batch script invokes a single Node process per Ace evaluation (i.e. a single Ace CLI launch), and the evaluations are sequenced, right? (no parallel concurrent executions)
This is a very important distinction, because in my test on Windows 10 Pro, I was able to reproduce bugs in PowerShell, GitBash, etc. consoles, where launching multiple Ace CLI instances / processes seems to be problematic (despite the completely isolated / individual log files, in the beta version)
There are no parallel concurrent executions. Even in that case, we have seen a node.js freeze. For instance during the test last Sunday, node.js did freeze, and the server was doing only EPUB checking.
Any news / positive developments @laudrain ?
Note that Ace 1.2.0-beta.3
has been published (npm install @daisy/ace@next
):
https://www.npmjs.com/package/@daisy/ace/v/1.2.0-beta.3
Notable changes: updated NPM package dependencies, including Axe 3.4.0
(but no new features or bug fixes in this intermediary release)
Source code: https://github.com/daisy/ace/pull/236 https://github.com/daisy/ace/tree/feature/axe-runner-electron
@danielweck the production server has been upgraded in CPU and memory 10 days ago. We did not change Ace version and stayed in 1.1.1. We have been since monitoring freezing node.js events and none occurred. We think this issue is fixed, it was a configuration issue.
On the Windows 2008 server, memory has been upgraded to 32Go. It is a 64bits system. There are less node.js lock but still some, with disturbing impact on our control chain. Do you know any way to report issues to node.js developer team ?
Hello Luc, thank you for keeping us updated. What version of Ace CLI are you using on the Windows 2008 server where such problems occur?
@danielweck Ace 1.1.1 Where should look to find clues about these freezes ?
Hello, are you able to reproduce these problems with Ace CLI version 1.2.0-beta.5
?
https://www.npmjs.com/package/@daisy/ace/v/1.2.0-beta.5
(npm install @daisy/ace@next
)
We haven't checked thoroughly this beta on our test server. And on that test server, we never experienced such freeze issues...
@danielweck I try to get from our internal integrators any system information (log, ...) for you when Ace freezes. What would be useful and where to find it in Ace installation?
@rdeltour @danielweck still another freeze tonight ! We have deactivate Ace checks... Here attached the files we have gathered on the context.
The json file seems to be complete and it ends with a "pass": 9782246819325.json.zip
The launch command:
node "D:\logiciels\daisy-1.1.1\node_modules\@daisy\ace\bin\ace.js" -s "O:\vtar\Prod\controles\Attente_Verif\gsoft_office\9782246819325\9782246819325-E\EPUBREF\9782246819325.epub"
The ace.log file (see last timestamp "timestamp":"2019-12-04T02:14:23.248Z" ace.log
And a screenshot of the server processes taken today around 11:00, showing the node.js process started at 03:14 and freezed.
Would you mind reopen the issue?
Hello Luc, yes we can of course re-open this issue, but please bear in mind that if the bug only occurs with Ace CLI version 1.1.1
- and not version 1.2.x
(currently still published on NPM as a pre-release) - then it is likely that the fix basically consists in upgrading to the latest Ace CLI, which includes many updates of library dependencies, such as the logging framework, the Puppeteer Axe runner, etc. (which may be the reason why 1.1.1
suffers from issues).
This is why it would be great if you could try Ace CLI 1.2.x
on the Windows server that is causing issues with 1.1.1
. Otherwise - just looking at the Ace CLI logs - I do not think we will be able to diagnose the cause of process freezing.
Personally, I suspect this has something to do with the logging framework and the filesystem, which is why I strongly recommend upgrading to Ace 1.2.x
as there have been improvements in this area.
Is NodeJS still version 8.8.1.0
, and Windows 2008 R2
?
Is there a possibility to upgrade Node to LTS 12.13.1
?
https://nodejs.org
@danielweck thanks, I did forward your advice.
Please provide the following details if possible or relevant:
ace -v
command).node -v
command).--verbose
option).Using Ace from the command line on the production server, there are recurrent locks of the node.exe process, at least once a day. Our team has to kill the process for next validations to run on following EPUBs. Attached is the screenshot of node.exe process stopped.
Any clue or checks we should try to figure out what's going wrong ?