arXiv / arxiv-submission-ui

User interface of NG submit system.
MIT License
2 stars 6 forks source link

Submission UI fails to recognize compilation failure and enters endless 'processing underway' loop. #123

Open DavidLFielding opened 5 years ago

DavidLFielding commented 5 years ago

Jim originally reported a problem with a malformed compilation summary for a know bad submission. The compilation summary was basically blank (no indication of TeX runs).

I have not been able to reproduce Jim's results. When I submit the problem source the compilation step enters an endless loop and never returns or times out.

I'm able to see a compilation error in the logs but this does not seem to propagate back to the submission UI properly.

At a minimum, the submission UI should time out on the compilation process. My compilation at develpopment.arxiv.org has been running for 15 minutes.

In my opinion, there should be a reasonable timeout on all calls to external services. When a timeout occurs appropriate action should be taken (retry, report error and give up).

Something else is going on here, likely related to exception handling in the compilation service. More testing should be performed on exception cases.

The issue demonstrating failed exception handling may be repeated in other services that follow the same communications model.

The two files are cesar.tex and cesar.bbl (uploaded to Slack by Jim)

erickpeirson commented 5 years ago

Getting close to a final pass on this. Two more PRs open:

Final steps to resolve:

erickpeirson commented 5 years ago

Briefly looked into task cancellation. This is not a terribly straightforward thing to do. Celery does have revocation, which will drop the task before it is picked up by a worker. Revocation can also generate a kill/term signal, but this will take the whole worker down (and may impact other tasks). Let's start with timeouts and recompilation for now.