daitss / core

DAITSS: Dark Archive In The Sunshine State
GNU General Public License v3.0
9 stars 2 forks source link

DZA SNAFU: undefined method `to_str' for nil:NilClass #180

Closed lydiam closed 13 years ago

lydiam commented 13 years ago

On of my DZA benchmark SIPs SNAFU-ed on tarchive:

SIP name: FDA6151502 IEID: EQM639H14_EAM248

 snafu
 time   Wed Jan 19 16:09:33 -0500 2011
 message    

 undefined method `to_str' for nil:NilClass

 trace  

 /opt/ruby-1.8.7/lib/ruby/1.8/fileutils.rb:1388:in `fu_list'
 /opt/ruby-1.8.7/lib/ruby/1.8/fileutils.rb:1388:in `map'
 /opt/ruby-1.8.7/lib/ruby/1.8/fileutils.rb:1388:in `fu_list'
 /opt/ruby-1.8.7/lib/ruby/1.8/fileutils.rb:605:in `rm_r'
 /opt/daitss2/service/core/lib/daitss/proc/datafile.rb:61:in `nuke!'
 /opt/daitss2/service/core/lib/daitss/proc/datafile/transform.rb:85:in `transform'
 /opt/daitss2/service/core/lib/daitss/proc/datafile/transform.rb:20:in `normalize!'
 /opt/daitss2/service/core/lib/daitss/proc/wip/preserve.rb:20:in `preserve'
 /opt/daitss2/service/core/lib/daitss/proc/wip/journal.rb:14:in `step'
 /opt/daitss2/service/core/lib/daitss/proc/wip/preserve.rb:17:in `preserve'
 /opt/daitss2/service/core/lib/daitss/proc/wip/preserve.rb:15:in `each'
 /opt/daitss2/service/core/lib/daitss/proc/wip/preserve.rb:15:in `preserve'

It's my impression that tarchive was straining under the load of archiving the 11 DZA benchmark SIPs, with a load average of ~9.6. Is this resource-related or a true SNAFU? I can re-try this SIP individually, if necessary.

grf commented 13 years ago

tarchive/darchive can easily sustain load averages like that - many cores. It's not doing much for our electricity bill, though.

lydiam commented 13 years ago

In that case it would appear to be a "true" SNAFU and not related to tarchive resources.

flazz commented 13 years ago

this looks like a deterministic bug, i'm on it.

flazz commented 13 years ago

23e1430c4d9e4f602b7e0e29e18598e62493927f will better show the cause of this issue.

please retry this as it is too big for me to try on my machine.

lydiam commented 13 years ago

I re-tried and got the following error:

 snafu
 time   Mon Jan 31 17:19:29 -0500 2011
 message    

 datafile 1-norm-0 already exists in files/normalized

 trace  

 /opt/daitss2/service/core/lib/daitss/proc/wip.rb:198:in `new_datafile'
 /opt/daitss2/service/core/lib/daitss/proc/wip.rb:148:in `new_normalized_datafile'
 /opt/daitss2/service/core/lib/daitss/proc/datafile/transform.rb:60:in `transform'
 /opt/daitss2/service/core/lib/daitss/proc/datafile/transform.rb:20:in `normalize!'
 /opt/daitss2/service/core/lib/daitss/proc/wip/preserve.rb:20:in `preserve'

It looks like this one might require some manual intervention in order to give it a good retry. I can re-submit, but we probably want to determine how we'd deal with problems like this as well.

lydiam commented 13 years ago

I submitted this SIP again on tarchive as IEID EDR5O7896_CDD6C8, and it's now at the "make tarball" step, so I'm assuming that it will archive. In a real-life situation we wouldn't want to request a new copy of a SIP in a situation like this, so I'd recommend trying to fix the prior version manually so that ingest could continue.

flazz commented 13 years ago

on the retry did you just restart the wip? and the second was a total resubmit? if so i'm guessing the problem was that it couldn't property talk to one of servers and tried to back out. the original error message was due to a bug in the backing out procedure. the patch i submitted, 23e1430c4d9e4f602b7e0e29e18598e62493927f, fixes that.

unless this error can be reproduced, i recommend we close this issue.

lydiam commented 13 years ago

The first time I retried and got the new error. The second time I did a new submission. How can we test the backing out procedure? It seems like the problem the first time was during normalization of the WAV file: can we submit the same package and kill description service during normalization, before the step is completed?

flazz commented 13 years ago

i believe that if you run this again with transformation service down it should reproduce the original situation but properly raise the error instead of complain about undefined method to_s and it can be verified by doing the same on the pre-patched codebase

childree commented 13 years ago

I've experienced this "undefined method `to_str' for nil:NilClass" SNAFU error on my entire set of test packages. When I try to UNSNAFU and START one package I get the following error:

Errno::ENOMEM at /workspace/E2FJ5573R_CYD116
Cannot allocate memory - fork(2)
* file: process.rb
* location: fork
* line: 55

These are the sizes of the packages in MB: 1584.15 1871.53 1338.92 255.91 1338.92 748.61 1290.09 1288.14

Current memory stats on ripple:

             total       used       free     shared    buffers     cached
Mem:          3775       3714         61          0        110        405
-/+ buffers/cache:       3198        577
Swap:         4091       3986        105
lydiam commented 13 years ago

Were you ingesting the above 8 packages concurrently on ripple? In my experience with ripple it can't seem to handle more than about 5-7 packages concurrently without SNAFU-ing.

childree commented 13 years ago

I was just sharing that this error that you encountered was caused by lack of memory when I encountered it.

flazz commented 13 years ago

undefined method even with respect to to_s can be very common. what would relate two instances is them similarity of the stack traces. also feel free to change the title of issues when appropriate.

so do we have any progress on this issue? did the package go through or a more informative massage come through?

lydiam commented 13 years ago

The package originally referenced in this issue, FDA6151502, did successfully archive when resubmitted and processed without concurrent WIPs being archived. I would agree with Jen that the "to_str" error starts happening when the server is running out of memory. I think that it can be fairly easily reproduced on either server by attempting to concurrently ingest multiple DZA project packages that contain WAVs.

I will open a new issue for the "datafile 1-norm-0 already exists in files/normalized" error, which has to do with backing cleanly out of failed transformation.

lydiam commented 13 years ago

This happened again when testing DZAs on Ripple with a throttle of 2. I'll set the throttle to 1 and retest. This may happen when we test the benchmarks on darchive.

lydiam commented 13 years ago

I'm making this a prod.0 issue because it may be a problem on darchive as well. I set Ripple's throttle to 1 and a SIP still snafu-ed with the following current error: snafu time Fri Apr 15 16:46:45 -0400 2011 message

 undefined method `to_str' for nil:NilClass

 trace  

 /opt/ruby-1.8.7/lib/ruby/1.8/fileutils.rb:1388:in `fu_list'
 /opt/ruby-1.8.7/lib/ruby/1.8/fileutils.rb:1388:in `map'
 /opt/ruby-1.8.7/lib/ruby/1.8/fileutils.rb:1388:in `fu_list'
 /opt/ruby-1.8.7/lib/ruby/1.8/fileutils.rb:605:in `rm_r'
 ./lib/daitss/proc/datafile.rb:61:in `nuke!'
 ./lib/daitss/proc/datafile/transform.rb:85:in `transform'
 ./lib/daitss/proc/datafile/transform.rb:20:in `normalize!'
 ./lib/daitss/proc/wip/preserve.rb:20:in `preserve'
 ./lib/daitss/proc/wip/journal.rb:14:in `step'
 ./lib/daitss/proc/wip/preserve.rb:17:in `preserve'
 ./lib/daitss/proc/wip/preserve.rb:15:in `each'
 ./lib/daitss/proc/wip/preserve.rb:15:in `preserve'
 ./lib/daitss/proc/wip/ingest.rb:29:in `ingest'
 ./lib/daitss/proc/wip/process.rb:71:in `send'
 ./lib/daitss/proc/wip/process.rb:71:in `spawn'
 ./lib/daitss/proc/wip/process.rb:55:in `fork'
 ./lib/daitss/proc/wip/process.rb:55:in `spawn'
 ./app.rb:574:in `POST /workspace/:id'
 /opt/web-services/sites/core/shared/bundle/ruby/1.8/gems/sinatra-1.1.2/lib/sinatra/base.rb:1057:in `call'
 /opt/web-services/sites/core/shared/bundle/ruby/1.8/gems/sinatra-1.1.2/lib/sinatra/base.rb:1057:in `compile!'
 /opt/web-services/sites/core/shared/bundle/ruby/1.8/gems/sinatra-1.1.2/lib/sinatra/base.rb:643:in `instance_eval'
 /opt/web-services/sites/core/shared/bundle/ruby/1.8/gems/sinatra-1.1.2/lib/sinatra/base.rb:643:in `route_eval'
avatar382 commented 13 years ago

is this still an issue?

childree commented 13 years ago

I haven't run in to this in production so I'm thinking it is a non-issue now.

lydiam commented 13 years ago

This may continue to be a ripple issue, however. I'm concerned that we can't use ripple for testing all types of packages because of resource limitations that cause errors such as this. I'm going to retry the DZAs on ripple to determine if we still get this error. It had been reproducible on ripple.

lydiam commented 13 years ago

The DZA packages still cause this error on ripple. Are there any recommendations for modifying the ripple server or its D2 configuration so that this error is eliminated? I'm assuming that we want D2 to scale to be used with smaller servers.

The current error:

  snafu
  time  Tue May 31 17:08:40 -0400 2011
  message   

  undefined method `to_str' for nil:NilClass

 trace  

 /opt/ruby-1.8.7/lib/ruby/1.8/fileutils.rb:1388:in `fu_list'
 /opt/ruby-1.8.7/lib/ruby/1.8/fileutils.rb:1388:in `map'
 /opt/ruby-1.8.7/lib/ruby/1.8/fileutils.rb:1388:in `fu_list'
 /opt/ruby-1.8.7/lib/ruby/1.8/fileutils.rb:605:in `rm_r'
 /opt/web-services/sites/core/current/lib/daitss/proc/datafile.rb:61:in `nuke!'
 /opt/web-services/sites/core/current/lib/daitss/proc/datafile/transform.rb:85:in `transform'
 /opt/web-services/sites/core/current/lib/daitss/proc/datafile/transform.rb:20:in `normalize!'
 /opt/web-services/sites/core/current/lib/daitss/proc/wip/preserve.rb:20:in `preserve'
 /opt/web-services/sites/core/current/lib/daitss/proc/wip/journal.rb:14:in `step'
 /opt/web-services/sites/core/current/lib/daitss/proc/wip/preserve.rb:17:in `preserve'
 /opt/web-services/sites/core/current/lib/daitss/proc/wip/preserve.rb:15:in `each'
 /opt/web-services/sites/core/current/lib/daitss/proc/wip/preserve.rb:15:in `preserve'
lydiam commented 13 years ago

I should note that ripple's throttle was set to '1', and all of the DZA benchmark packages did SNAFU again. This is currently not a high priority, but I wonder if Ilana will experience this as well. I intend to send her copies of our benchmark packages, so we'll find out.

lydiam commented 13 years ago

Correction: it appears that one did archive successfully. EE8YI9TUI_JARXMQ (FDA6180701). The original SIPs can be found on ripple at /var/daitss/FDA_benchmarks/DZA. I wonder why this one archived this time.