Open chfoo opened 10 years ago
Ah... it might be because I did CTRL+Z, touch STOP, fg && git pull && run-pipeline several times by accident. Possibly related to #45.
Also this was running when FOS went down and then back up after maintenance.
Possibly related to #39
Probably a duplicate of #27 .
Not fixed. Still occurring on seesaw 0.1.5.
Output:
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73eW8:73eWk
sending incremental file list
rsync: change_dir "/home/chris/puush-grab/data/1392894892055977b6cec8154f-98035/" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at flist.c(2027) [sender=3.0.9]
Process RsyncUpload returned exit code 3 for Item 73eW8:73eWk
Failed RsyncUpload for Item 73eW8:73eWk
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73eW8:73eWk
sending incremental file list
rsync: change_dir "/home/chris/puush-grab/data/1392894892055977b6cec8154f-98035/" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at flist.c(2027) [sender=3.0.9]
Process RsyncUpload returned exit code 3 for Item 73eW8:73eWk
Failed RsyncUpload for Item 73eW8:73eWk
Retrying after 30 seconds...
chris@droplet1:~/puush-grab$ ls data -hal
total 20K
drwxrwxr-x 5 chris chris 4.0K Feb 20 11:17 .
drwxr-xr-x 4 chris chris 4.0K Jan 9 08:34 ..
drwxrwxr-x 2 chris chris 4.0K Jan 10 14:04 1389362657d03e827ff0d2a8c7-6362
drwxrwxr-x 2 chris chris 4.0K Jan 25 20:16 1390680986e70321fa713b3b1d-57050
drwxrwxr-x 2 chris chris 4.0K Feb 20 11:17 1392895047b731b43fde416e73-98036
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
Failed LimitConcurrent(<shared:rsync_threads:1> x Conditional(Upload2)) for Item 73MTB:73MTN
Traceback (most recent call last):
File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 302, in wrapped
ret = fn(*args, **kwargs)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/tracker.py", line 55, in handle_response
self.process_body(response.body, item)
File "<string>", line 542, in process_body
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 83, in enqueue
self.process(item)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 102, in process
p.stdin.write(self.stdin_data(item))
IOError: [Errno 32] Broken pipe
Waiting 10 seconds...
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
Failed LimitConcurrent(<shared:rsync_threads:1> x Conditional(Upload2)) for Item 73MTB:73MTN
Traceback (most recent call last):
File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 302, in wrapped
ret = fn(*args, **kwargs)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/tracker.py", line 55, in handle_response
self.process_body(response.body, item)
File "<string>", line 542, in process_body
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 83, in enqueue
self.process(item)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 102, in process
p.stdin.write(self.stdin_data(item))
IOError: [Errno 32] Broken pipe
Waiting 10 seconds...
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Starting GetItemFromTracker for Item
Received item '73Yef:73Yer' from tracker
Starting ExtraItemParams for Item 73Yef:73Yer
Sub items: 73Yeq, 73Yep, 73Yer, 73Yei, 73Yeh, 73Yek, 73Yej, 73Yem, 73Yel, 73Yeo, 73Yen, 73Yeg, 73Yef
Finished ExtraItemParams for Item 73Yef:73Yer
Starting PrepareDirectories for Item 73Yef:73Yer
Finished PrepareDirectories for Item 73Yef:73Yer
Starting WgetDownloadMany for Item 73Yef:73Yer
Start downloading URL http://puu.sh/73Yeq
Server returned status 200
Start downloading URL http://puu.sh/73Yep
Server returned status 200
Start downloading URL http://puu.sh/73Yer
Server returned status 200
Start downloading URL http://puu.sh/73Yei
Server returned status 200
Start downloading URL http://puu.sh/73Yeh
Server returned status 200
Start downloading URL http://puu.sh/73Yek
Server returned status 200
Start downloading URL http://puu.sh/73Yej
Server returned status 200
Start downloading URL http://puu.sh/73Yem
Server returned status 200
Start downloading URL http://puu.sh/73Yel
Server returned status 200
Start downloading URL http://puu.sh/73Yeo
Server returned status 200
Start downloading URL http://puu.sh/73Yen
Server returned status 200
Start downloading URL http://puu.sh/73Yeg
Server returned status 200
Start downloading URL http://puu.sh/73Yef
Server returned status 200
Finished WgetDownloadMany for Item 73Yef:73Yer
Starting MoveFiles for Item 73Yef:73Yer
Finished MoveFiles for Item 73Yef:73Yer
Starting PrepareStatsForTracker2 for Item 73Yef:73Yer
Finished PrepareStatsForTracker2 for Item 73Yef:73Yer
Starting CleanUpItemDir for Item 73Yef:73Yer
Finished CleanUpItemDir for Item 73Yef:73Yer
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
Failed LimitConcurrent(<shared:rsync_threads:1> x Conditional(Upload2)) for Item 73MTB:73MTN
Traceback (most recent call last):
File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 302, in wrapped
ret = fn(*args, **kwargs)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/tracker.py", line 55, in handle_response
self.process_body(response.body, item)
File "<string>", line 542, in process_body
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 83, in enqueue
self.process(item)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 102, in process
p.stdin.write(self.stdin_data(item))
IOError: [Errno 32] Broken pipe
Warning: Ignoring extra fail event.
File "/home/chris/.local/bin/run-pipeline", line 135, in <module>
main()
File "/home/chris/.local/bin/run-pipeline", line 131, in main
runner.start()
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/runner.py", line 157, in start
ioloop.IOLoop.instance().start()
File "/home/chris/.local/lib/python2.7/site-packages/tornado/ioloop.py", line 607, in start
self._run_callback(callback)
File "/home/chris/.local/lib/python2.7/site-packages/tornado/ioloop.py", line 458, in _run_callback
callback()
File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 309, in wrapped
exc = _handle_exception(top, exc)
File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 343, in _handle_exception
if tail.exit(*exc):
File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 186, in exit
return self.exception_handler(type, value, traceback)
File "/usr/lib/python2.7/contextlib.py", line 84, in helper
return GeneratorContextManager(func(*args, **kwds))
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/pipeline.py", line 51, in handle_item_exception
task.fail_item(item)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/task.py", line 28, in fail_item
self.on_fail_item(self, item)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/event.py", line 30, in fire
handler(*args, **kargs)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/pipeline.py", line 65, in _task_fail_item
self._fail_item(item)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/pipeline.py", line 99, in _fail_item
''.join(traceback.format_stack()))
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
Failed LimitConcurrent(<shared:rsync_threads:1> x Conditional(Upload2)) for Item 73MTB:73MTN
Traceback (most recent call last):
File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 302, in wrapped
ret = fn(*args, **kwargs)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/tracker.py", line 55, in handle_response
self.process_body(response.body, item)
File "<string>", line 542, in process_body
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py
", line 83, in enqueue
self.process(item)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 102, in process
p.stdin.write(self.stdin_data(item))
IOError: [Errno 32] Broken pipe
Warning: Ignoring extra fail event.
File "/home/chris/.local/bin/run-pipeline", line 135, in <module>
main()
File "/home/chris/.local/bin/run-pipeline", line 131, in main
runner.start()
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/runner.py", line 157, in start
ioloop.IOLoop.instance().start()
File "/home/chris/.local/lib/python2.7/site-packages/tornado/ioloop.py", line 607, in start
self._run_callback(callback)
File "/home/chris/.local/lib/python2.7/site-packages/tornado/ioloop.py", line 458, in _run_callback
callback()
File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 309, in wrapped
exc = _handle_exception(top, exc)
File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 343, in _handle_exception
if tail.exit(*exc):
File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 186, in exit
return self.exception_handler(type, value, traceback)
File "/usr/lib/python2.7/contextlib.py", line 84, in helper
return GeneratorContextManager(func(*args, **kwds))
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/pipeline.py", line 51, in handle_item_exception
task.fail_item(item)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/task.py", line 28, in fail_item
self.on_fail_item(self, item)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/event.py", line 30, in fire
handler(*args, **kargs)
File "/home/chris/.local/lib/python2.7/site-packages/seesaw/pipeline.py", line 65, in _task_fail_item
''.join(traceback.format_stack()))
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
LimitConcurrent and UploadWithTracker both have inner_task
and it's not clear to me how nested task exceptions should be handled.
It's still happening...... : https://i.imgur.com/unHlTlC.png
The user has set up their VM using Debian on Sept. 2, 2014.
A user was running 50 concurrent on twitpic-grab but all of them were stuck on MoveFiles().
674=200 https://d3j5vwomefv46c.cloudfront.net/photos/mini/85989779.jpg?1271116930.
Finished WgetDownload for Item image:1f724
Starting PrepareStatsForTracker for Item image:1f724
Finished PrepareStatsForTracker for Item image:1f724
Starting MoveFiles for Item image:1f724
Finished MoveFiles for Item image:1f724
It is suspected a deadlock caused by a race condition in LimitConcurrent() with its inner task.
I have observed this on the bayimg-grab pipeline script (running on Debian Jessie with a 32-bit kernel (version 3.16.0-4-686-pae #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) i686 GNU/Linux), Debian GLIBC 2.19-18+deb8u4, and Python 2.7.9). This occured when the custom "rsync" step failed. In the web UI the original task disappeared and other tasks continued to process until they reached the "Upload" step of the pipeline script, where they stalled. Additionally, the web UI reported that there was 1 task on the "Upload" step (when none were) and there were -1 tasks on the "rsync" step (note that this pipeline was running with concurrent=2).
In the console the following output was observed for the job:
Finished WgetDownload for Item 16pictures:nagieaaf
Starting PrepareStatsForTracker for Item 16pictures:nagieaaf
Finished PrepareStatsForTracker for Item 16pictures:nagieaaf
Starting MoveFiles for Item 16pictures:nagieaaf
Finished MoveFiles for Item 16pictures:nagieaaf
Starting rsync for Item 16pictures:nagieaaf
sending incremental file list
bayimg-16pictures_nagieaaf-20160428-161119_data.txt
10 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=0/1)
Failed rsync for Item 16pictures:nagieaaf
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/seesaw/externalprocess.py", line 155, in _handle_subprocess_stdout
self.on_output(data)
File "/usr/local/lib/python2.7/dist-packages/seesaw/event.py", line 31, in fire
handler(*args, **kargs)
File "/usr/local/lib/python2.7/dist-packages/seesaw/externalprocess.py", line 219, in on_subprocess_stdout
item.log_output(data, full_line=False)
File "/usr/local/lib/python2.7/dist-packages/seesaw/item.py", line 81, in log_output
self.on_output(self, data)
File "/usr/local/lib/python2.7/dist-packages/seesaw/event.py", line 31, in fire
handler(*args, **kargs)
File "/usr/local/lib/python2.7/dist-packages/seesaw/runner.py", line 176, in _handle_item_output
sys.stdout.write(data)
IOError: [Errno 4] Interrupted system call
Waiting 10 seconds...
Finished rsync for Item 16pictures:nagieaaf
Starting Upload for Item 16pictures:nagieaaf
Uploading with Rsync to rsync://fos.textfiles.com/chfoo/warrior/bayimg/MrRadar/
Starting RsyncUpload for Item 16pictures:nagieaaf
sending incremental file list
rsync: change_dir "/home/tyler/bayimg-grab/data/1461877879fddd48305f4eeafa-54/" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at flist.c(2118) [sender=3.1.1]
Process RsyncUpload returned exit code 3 for Item 16pictures:nagieaaf
Failed RsyncUpload for Item 16pictures:nagieaaf
Retrying after 60 seconds...
At that point it appears the job was aborted and a new job started.
Seesaw version: 0.1.4
This is occurring on: ArchiveTeam/wretch-grab/commit/63f87fbf
Alright, let's see the data dir...
Uh, that's not right.
shrugs