Open erickpeirson opened 5 years ago
@DavidLFielding I am noticing that filemanager is restarting a lot; seeing things like:
xxx.xxx.xxx.xxx - -|- [-] [kube-probe/1.10] "GET /filemanager/api/status HTTP/1.1" 200 107 97187 -
DEBUG:filemanager.controllers.upload:Handling upload request for 13
ERROR:filemanager.controllers.upload:Upload 'archive' not specified. Oversize calculation will use default values.
INFO:filemanager.controllers.upload:13: Upload files to existing workspace: file='2288326.tar.gz'
...brutally killing workers...
worker 1 buried after 1 seconds
binary reloading uWSGI...
chdir() to /opt/arxiv
closing all non-uwsgi socket fds > 2 (max_fd = 1048576)...
Doesn't look like it's starved for resources:
Via external API, seeing responses in the 500-700ms range on uploads to /
.
Responses from getUploadStatus
endpoint are in the 150-350ms range.
Ok, so one thing that I am seeing is that, by default, the filemanager service integration will follow 2xx-series redirects: https://github.com/arXiv/arxiv-base/blob/master/arxiv/integration/api/service.py#L162
So when filemanager says 201 Created
, submission UI will issue a GET
request to the endpoint in the Location
header. This isn't needed, since filemanager will immediately return the disposition of the upload when responding to the POST
request.
Removed the redirect in 34ae26357800d7225bafcd15e9237742dbd4dc92, but still slow. Seeing really long response times...this was an 18 byte upload that took 8,182,050 microseconds (8.2 seconds):
DEBUG:filemanager.controllers.upload:Handling upload request for 4
ERROR:filemanager.controllers.upload:Upload 'archive' not specified. Oversize calculation will use default values.
INFO:filemanager.controllers.upload:4: Upload files to existing workspace: file='head.tmp'
INFO:filemanager.controllers.upload:4: Processed upload. Saved to DB. Preparing upload summary.
INFO:filemanager.controllers.upload:4: Generating upload summary.
DEBUG:filemanager.controllers.upload:Response data: {'upload_id': 4, 'upload_total_size': 7437610, 'upload_compressed_size': 1125161, 'created_datetime': datetime.datetime(2019, 6, 4, 15, 18, 31, tzinfo=<UTC>), 'modified_datetime': datetime.datetime(2019, 6, 4, 18, 49, 48, tzinfo=<UTC>), 'start_datetime': datetime.datetime(2019, 6, 5, 20, 15, 5, 44505, tzinfo=<UTC>), 'completion_datetime': datetime.datetime(2019, 6, 5, 20, 15, 10, 528678, tzinfo=<UTC>), 'files': [{'name': '19.eps', 'public_filepath': '19.eps', 'size': 817320, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:49.270000+00:00'}, {'name': '10a.eps', 'public_filepath': '10a.eps', 'size': 790044, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:47.980000+00:00'}, {'name': '16.ps', 'public_filepath': '16.ps', 'size': 29147, 'type': 'Postscript', 'modified_datetime': '2019-06-04T18:49:42.745000+00:00'}, {'name': '7.ps', 'public_filepath': '7.ps', 'size': 32013, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:51.930000+00:00'}, {'name': '3.ps', 'public_filepath': '3.ps', 'size': 35930, 'type': 'Postscript', 'modified_datetime': '2019-06-04T18:49:43.173000+00:00'}, {'name': '13b.eps', 'public_filepath': '13b.eps', 'size': 504832, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:48.765000+00:00'}, {'name': '1.ps', 'public_filepath': '1.ps', 'size': 43078, 'type': 'Postscript', 'modified_datetime': '2019-06-04T18:49:43.749000+00:00'}, {'name': '11b.eps', 'public_filepath': '11b.eps', 'size': 646615, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:48.340000+00:00'}, {'name': '12a.eps', 'public_filepath': '12a.eps', 'size': 984316, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:48.471000+00:00'}, {'name': '9.ps', 'public_filepath': '9.ps', 'size': 28286, 'type': 'Postscript', 'modified_datetime': '2019-06-04T18:49:44.711000+00:00'}, {'name': '14.ps', 'public_filepath': '14.ps', 'size': 32196, 'type': 'Postscript', 'modified_datetime': '2019-06-04T18:49:44.945000+00:00'}, {'name': '5.ps', 'public_filepath': '5.ps', 'size': 27561, 'type': 'Postscript', 'modified_datetime': '2019-06-04T18:49:45.189000+00:00'}, {'name': '17.eps', 'public_filepath': '17.eps', 'size': 717506, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:49.079000+00:00'}, {'name': '8.ps', 'public_filepath': '8.ps', 'size': 31371, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:54.178000+00:00'}, {'name': '11a.eps', 'public_filepath': '11a.eps', 'size': 726267, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:48.207000+00:00'}, {'name': '10b.eps', 'public_filepath': '10b.eps', 'size': 331105, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:48.072000+00:00'}, {'name': '4.ps', 'public_filepath': '4.ps', 'size': 36238, 'type': 'Postscript', 'modified_datetime': '2019-06-04T18:49:46.525000+00:00'}, {'name': '2.ps', 'public_filepath': '2.ps', 'size': 57263, 'type': 'Postscript', 'modified_datetime': '2019-06-04T18:49:46.806000+00:00'}, {'name': '13a.eps', 'public_filepath': '13a.eps', 'size': 681822, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:48.670000+00:00'}, {'name': '12b.eps', 'public_filepath': '12b.eps', 'size': 330845, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:48.558000+00:00'}, {'name': '6.ps', 'public_filepath': '6.ps', 'size': 28651, 'type': 'Postscript', 'modified_datetime': '2019-06-04T18:49:47.503000+00:00'}, {'name': '15.ps', 'public_filepath': '15.ps', 'size': 33300, 'type': 'Postscript', 'modified_datetime': '2019-06-04T18:49:47.765000+00:00'}, {'name': '18.eps', 'public_filepath': '18.eps', 'size': 491886, 'type': 'Postscript', 'modified_datetime': '2019-06-04T15:18:49.172000+00:00'}, {'name': 'head.tmp', 'public_filepath': 'anc/head.tmp', 'size': 18, 'type': 'Always ignore', 'modified_datetime': '2019-06-05T20:15:05.367000+00:00'}], 'errors': [], 'upload_status': 'READY', 'workspace_state': 'ACTIVE', 'lock_state': 'UNLOCKED', 'source_format': 'ps', 'checksum': '4NcQR6urVfQQKlYUL5zJQQ=='}
"[removed] [removed] - -|- [-] [python-requests/2.21.0] \"POST /filemanager/api/4 HTTP/1.1\" 201 3772 8182050 -"
(from filemanager service log)
@DavidLFielding so I think we know so far:
After the refactor (https://github.com/arXiv/arxiv-filemanager/pull/43), I isolated the remaining slowness to packing the source package tar-ball. I was able to get about a 10x speedup switching from tarfile
package to using subprocess
to use the system tar -czf
. Still seeing about 2 seconds on the pack step:
upload workspace exists at 8.344650268554688e-06
upload workspace retrieved at 0.0023500919342041016
workspace finished processing upload at 0.020180702209472656
workspace persisted at 0.1458911895751953
source packed at 2.3734874725341797
db updated at 2.3939309120178223
workspace serialized at 2.473506450653076
done at 2.494154453277588
Noticing that even small uploads can take a few seconds, and occasionally time out. We should take a closer look at uWSGI tuning for filemanager, and see if we can pin down the bottleneck. ARXIVNG-2211