ArchiveTeam / seesaw-kit

Making a reusable toolkit for writing seesaw scripts
Other
69 stars 30 forks source link

[LimitConcurrent] Rsync uploading with missing data directory #48

Open chfoo opened 10 years ago

chfoo commented 10 years ago

Seesaw version: 0.1.4

This is occurring on: ArchiveTeam/wretch-grab/commit/63f87fbf

[...lot's of scrolling...]
Uploading with Rsync to rsync://fos.textfiles.com/chfoo/warrior/wretch/chfoo/
Starting RsyncUpload for Item loveohead
sending incremental file list
rsync: change_dir "/home/chris/wretch-grab/data/1387476854ecdf61f810a51cb9-11/" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at flist.c(2027) [sender=3.0.9]
Process RsyncUpload returned exit code 3 for Item loveohead
Failed RsyncUpload for Item loveohead
Retrying after 30 seconds...
Uploading with Rsync to rsync://fos.textfiles.com/chfoo/warrior/wretch/chfoo/
Starting RsyncUpload for Item loveohead
sending incremental file list
rsync: change_dir "/home/chris/wretch-grab/data/1387476854ecdf61f810a51cb9-11/" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at flist.c(2027) [sender=3.0.9]
Process RsyncUpload returned exit code 3 for Item loveohead
Failed RsyncUpload for Item loveohead
Retrying after 30 seconds...

Alright, let's see the data dir...

chris@droplet1:~/wretch-grab$ ls -halR data
data:
total 12K
drwxrwxr-x 3 chris chris 4.0K Dec 19 20:10 .
drwxr-xr-x 4 chris chris 4.0K Dec 19 10:10 ..
drwxrwxr-x 2 chris chris 4.0K Dec 19 20:13 1387483825d015c6f36e3d5489-12

data/1387483825d015c6f36e3d5489-12:
total 376K
drwxrwxr-x 2 chris chris 4.0K Dec 19 20:13 .
drwxrwxr-x 3 chris chris 4.0K Dec 19 20:10 ..
-rw-rw-r-- 1 chris chris 368K Dec 19 20:13 wretch-loveasy4me-20131219-201026.warc.gz

Uh, that's not right.

shrugs

chfoo commented 10 years ago

Ah... it might be because I did CTRL+Z, touch STOP, fg && git pull && run-pipeline several times by accident. Possibly related to #45.

chfoo commented 10 years ago

Also this was running when FOS went down and then back up after maintenance.

chfoo commented 10 years ago

Possibly related to #39

chfoo commented 10 years ago

Probably a duplicate of #27 .

chfoo commented 10 years ago

Not fixed. Still occurring on seesaw 0.1.5.

Output:

Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73eW8:73eWk
sending incremental file list
rsync: change_dir "/home/chris/puush-grab/data/1392894892055977b6cec8154f-98035/" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at flist.c(2027) [sender=3.0.9]
Process RsyncUpload returned exit code 3 for Item 73eW8:73eWk
Failed RsyncUpload for Item 73eW8:73eWk
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73eW8:73eWk
sending incremental file list
rsync: change_dir "/home/chris/puush-grab/data/1392894892055977b6cec8154f-98035/" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at flist.c(2027) [sender=3.0.9]
Process RsyncUpload returned exit code 3 for Item 73eW8:73eWk
Failed RsyncUpload for Item 73eW8:73eWk
Retrying after 30 seconds...
chris@droplet1:~/puush-grab$ ls data -hal
total 20K
drwxrwxr-x 5 chris chris 4.0K Feb 20 11:17 .
drwxr-xr-x 4 chris chris 4.0K Jan  9 08:34 ..
drwxrwxr-x 2 chris chris 4.0K Jan 10 14:04 1389362657d03e827ff0d2a8c7-6362
drwxrwxr-x 2 chris chris 4.0K Jan 25 20:16 1390680986e70321fa713b3b1d-57050
drwxrwxr-x 2 chris chris 4.0K Feb 20 11:17 1392895047b731b43fde416e73-98036
chfoo commented 10 years ago
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
Failed LimitConcurrent(<shared:rsync_threads:1> x Conditional(Upload2)) for Item 73MTB:73MTN
Traceback (most recent call last):
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/tracker.py", line 55, in handle_response
    self.process_body(response.body, item)
  File "<string>", line 542, in process_body
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 83, in enqueue
    self.process(item)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 102, in process
    p.stdin.write(self.stdin_data(item))
IOError: [Errno 32] Broken pipe

Waiting 10 seconds...
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
Failed LimitConcurrent(<shared:rsync_threads:1> x Conditional(Upload2)) for Item 73MTB:73MTN
Traceback (most recent call last):
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/tracker.py", line 55, in handle_response
    self.process_body(response.body, item)
  File "<string>", line 542, in process_body
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 83, in enqueue
    self.process(item)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 102, in process
    p.stdin.write(self.stdin_data(item))
IOError: [Errno 32] Broken pipe

Waiting 10 seconds...
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds...
Starting GetItemFromTracker for Item
Received item '73Yef:73Yer' from tracker
Starting ExtraItemParams for Item 73Yef:73Yer
Sub items: 73Yeq, 73Yep, 73Yer, 73Yei, 73Yeh, 73Yek, 73Yej, 73Yem, 73Yel, 73Yeo, 73Yen, 73Yeg, 73Yef
Finished ExtraItemParams for Item 73Yef:73Yer
Starting PrepareDirectories for Item 73Yef:73Yer
Finished PrepareDirectories for Item 73Yef:73Yer
Starting WgetDownloadMany for Item 73Yef:73Yer
Start downloading URL http://puu.sh/73Yeq
  Server returned status 200 
Start downloading URL http://puu.sh/73Yep
  Server returned status 200 
Start downloading URL http://puu.sh/73Yer
  Server returned status 200 
Start downloading URL http://puu.sh/73Yei
  Server returned status 200 
Start downloading URL http://puu.sh/73Yeh
  Server returned status 200 
Start downloading URL http://puu.sh/73Yek
  Server returned status 200 
Start downloading URL http://puu.sh/73Yej
  Server returned status 200 
Start downloading URL http://puu.sh/73Yem
  Server returned status 200 
Start downloading URL http://puu.sh/73Yel
  Server returned status 200 
Start downloading URL http://puu.sh/73Yeo
  Server returned status 200 
Start downloading URL http://puu.sh/73Yen
  Server returned status 200 
Start downloading URL http://puu.sh/73Yeg
  Server returned status 200 
Start downloading URL http://puu.sh/73Yef
  Server returned status 200 
Finished WgetDownloadMany for Item 73Yef:73Yer
Starting MoveFiles for Item 73Yef:73Yer
Finished MoveFiles for Item 73Yef:73Yer
Starting PrepareStatsForTracker2 for Item 73Yef:73Yer
Finished PrepareStatsForTracker2 for Item 73Yef:73Yer
Starting CleanUpItemDir for Item 73Yef:73Yer
Finished CleanUpItemDir for Item 73Yef:73Yer
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds... 
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds... 
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds... 
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
Failed LimitConcurrent(<shared:rsync_threads:1> x Conditional(Upload2)) for Item 73MTB:73MTN
Traceback (most recent call last):
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/tracker.py", line 55, in handle_response
    self.process_body(response.body, item)
  File "<string>", line 542, in process_body
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 83, in enqueue
    self.process(item)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 102, in process
    p.stdin.write(self.stdin_data(item))
IOError: [Errno 32] Broken pipe

Warning: Ignoring extra fail event.
  File "/home/chris/.local/bin/run-pipeline", line 135, in <module>
    main()
  File "/home/chris/.local/bin/run-pipeline", line 131, in main
    runner.start()
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/runner.py", line 157, in start
    ioloop.IOLoop.instance().start()
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/ioloop.py", line 607, in start
    self._run_callback(callback)
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/ioloop.py", line 458, in _run_callback
    callback()
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 309, in wrapped
    exc = _handle_exception(top, exc)
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 343, in _handle_exception
    if tail.exit(*exc):
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 186, in exit
    return self.exception_handler(type, value, traceback)
  File "/usr/lib/python2.7/contextlib.py", line 84, in helper
    return GeneratorContextManager(func(*args, **kwds))
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/pipeline.py", line 51, in handle_item_exception
    task.fail_item(item)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/task.py", line 28, in fail_item
    self.on_fail_item(self, item)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/event.py", line 30, in fire
    handler(*args, **kargs)  
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/pipeline.py", line 65, in _task_fail_item
    self._fail_item(item)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/pipeline.py", line 99, in _fail_item
    ''.join(traceback.format_stack()))
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds... 
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
Failed LimitConcurrent(<shared:rsync_threads:1> x Conditional(Upload2)) for Item 73MTB:73MTN
Traceback (most recent call last):
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/tracker.py", line 55, in handle_response
    self.process_body(response.body, item)
  File "<string>", line 542, in process_body
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py
", line 83, in enqueue
    self.process(item)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/externalprocess.py", line 102, in process
    p.stdin.write(self.stdin_data(item))
IOError: [Errno 32] Broken pipe

Warning: Ignoring extra fail event.
  File "/home/chris/.local/bin/run-pipeline", line 135, in <module>
    main()
  File "/home/chris/.local/bin/run-pipeline", line 131, in main
    runner.start()
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/runner.py", line 157, in start
    ioloop.IOLoop.instance().start()
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/ioloop.py", line 607, in start
    self._run_callback(callback)
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/ioloop.py", line 458, in _run_callback
    callback()
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 309, in wrapped
    exc = _handle_exception(top, exc)
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 343, in _handle_exception
    if tail.exit(*exc):
  File "/home/chris/.local/lib/python2.7/site-packages/tornado/stack_context.py", line 186, in exit
    return self.exception_handler(type, value, traceback)
  File "/usr/lib/python2.7/contextlib.py", line 84, in helper
    return GeneratorContextManager(func(*args, **kwds))
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/pipeline.py", line 51, in handle_item_exception
    task.fail_item(item)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/task.py", line 28, in fail_item
    self.on_fail_item(self, item)
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/event.py", line 30, in fire
    handler(*args, **kargs)  
  File "/home/chris/.local/lib/python2.7/site-packages/seesaw/pipeline.py", line 65, in _task_fail_item
    ''.join(traceback.format_stack()))
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds... 
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds... 
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds... 
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds... 
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]
Process RsyncUpload returned exit code 10 for Item 73MTB:73MTN
Failed RsyncUpload for Item 73MTB:73MTN
Retrying after 30 seconds... 
Uploading with Rsync to rsync://192.241.206.171/archiveteam-puush/chfoo/
Starting RsyncUpload for Item 73MTB:73MTN
rsync: failed to connect to 192.241.206.171 (192.241.206.171): Connection refused (111)
chfoo commented 10 years ago

LimitConcurrent and UploadWithTracker both have inner_task and it's not clear to me how nested task exceptions should be handled.

chfoo commented 10 years ago

It's still happening...... : https://i.imgur.com/unHlTlC.png

The user has set up their VM using Debian on Sept. 2, 2014.

chfoo commented 10 years ago

A user was running 50 concurrent on twitpic-grab but all of them were stuck on MoveFiles().

674=200 https://d3j5vwomefv46c.cloudfront.net/photos/mini/85989779.jpg?1271116930.
Finished WgetDownload for Item image:1f724
Starting PrepareStatsForTracker for Item image:1f724
Finished PrepareStatsForTracker for Item image:1f724
Starting MoveFiles for Item image:1f724
Finished MoveFiles for Item image:1f724

It is suspected a deadlock caused by a race condition in LimitConcurrent() with its inner task.

tylerknott commented 8 years ago

I have observed this on the bayimg-grab pipeline script (running on Debian Jessie with a 32-bit kernel (version 3.16.0-4-686-pae #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) i686 GNU/Linux), Debian GLIBC 2.19-18+deb8u4, and Python 2.7.9). This occured when the custom "rsync" step failed. In the web UI the original task disappeared and other tasks continued to process until they reached the "Upload" step of the pipeline script, where they stalled. Additionally, the web UI reported that there was 1 task on the "Upload" step (when none were) and there were -1 tasks on the "rsync" step (note that this pipeline was running with concurrent=2).

bayimg-fail

In the console the following output was observed for the job:

Finished WgetDownload for Item 16pictures:nagieaaf
Starting PrepareStatsForTracker for Item 16pictures:nagieaaf
Finished PrepareStatsForTracker for Item 16pictures:nagieaaf
Starting MoveFiles for Item 16pictures:nagieaaf
Finished MoveFiles for Item 16pictures:nagieaaf
Starting rsync for Item 16pictures:nagieaaf
sending incremental file list
bayimg-16pictures_nagieaaf-20160428-161119_data.txt
             10 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=0/1)

Failed rsync for Item 16pictures:nagieaaf
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 314, in wrapped
    ret = fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/seesaw/externalprocess.py", line 155, in _handle_subprocess_stdout
    self.on_output(data)
  File "/usr/local/lib/python2.7/dist-packages/seesaw/event.py", line 31, in fire
    handler(*args, **kargs)
  File "/usr/local/lib/python2.7/dist-packages/seesaw/externalprocess.py", line 219, in on_subprocess_stdout
    item.log_output(data, full_line=False)
  File "/usr/local/lib/python2.7/dist-packages/seesaw/item.py", line 81, in log_output
    self.on_output(self, data)
  File "/usr/local/lib/python2.7/dist-packages/seesaw/event.py", line 31, in fire
    handler(*args, **kargs)
  File "/usr/local/lib/python2.7/dist-packages/seesaw/runner.py", line 176, in _handle_item_output
    sys.stdout.write(data)
IOError: [Errno 4] Interrupted system call
Waiting 10 seconds...
Finished rsync for Item 16pictures:nagieaaf
Starting Upload for Item 16pictures:nagieaaf
Uploading with Rsync to rsync://fos.textfiles.com/chfoo/warrior/bayimg/MrRadar/
Starting RsyncUpload for Item 16pictures:nagieaaf
sending incremental file list
rsync: change_dir "/home/tyler/bayimg-grab/data/1461877879fddd48305f4eeafa-54/" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at flist.c(2118) [sender=3.1.1]
Process RsyncUpload returned exit code 3 for Item 16pictures:nagieaaf
Failed RsyncUpload for Item 16pictures:nagieaaf
Retrying after 60 seconds...

At that point it appears the job was aborted and a new job started.