Open lydiam opened 8 years ago
After stopping and restarting DAITSS, ruby is still spawning all sorts of processes:
[lydiam@fclnx30 submit-direct]$ ps -ef | grep daitss
postgres 5364 9871 96 04:22 ? 10:23:00 postgres: daitss2 daitss_db 127.0.0.1(60080) SELECT
root 6159 24970 0 14:29 pts/3 00:00:00 sudo su - daitss
root 6163 6159 0 14:30 pts/3 00:00:00 su - daitss
daitss 6164 6163 0 14:30 pts/3 00:00:00 -bash
daitss 14663 1 0 15:00 ? 00:00:01 thin server (0.0.0.0:7002) [describe]
daitss 14675 1 0 15:00 ? 00:00:01 thin server (0.0.0.0:7003) [describe]
daitss 14681 1 4 15:00 ? 00:00:15 ruby /opt/web-services/sites/core/current/bin/pulse
daitss 14686 1 0 15:00 ? 00:00:03 ruby ./bin/mailer-daemon
postgres 14752 9871 0 15:00 ? 00:00:00 postgres: daitss2 daitss_db 127.0.0.1(56442) idle
daitss 14753 14681 7 15:00 ? 00:00:27 E87C3IH70_T30OFO.ingest
postgres 14762 9871 2 15:01 ? 00:00:08 postgres: daitss2 daitss_db 127.0.0.1(56444) idle
daitss 14770 14640 0 15:01 ? 00:00:00 Passenger ApplicationSpawner: /opt/web-services/sites/viruscheck/current
daitss 14779 1 18 15:01 ? 00:01:01 Rack: /opt/web-services/sites/viruscheck/current
daitss 14786 1 0 15:01 ? 00:00:00 Rack: /opt/web-services/sites/viruscheck/current
daitss 14792 1 0 15:01 ? 00:00:00 Rack: /opt/web-services/sites/viruscheck/current
daitss 14798 1 0 15:01 ? 00:00:00 Rack: /opt/web-services/sites/viruscheck/current
daitss 14895 14640 1 15:01 ? 00:00:04 Passenger ApplicationSpawner: /opt/web-services/sites/core/current
daitss 14908 1 2 15:01 ? 00:00:07 Rack: /opt/web-services/sites/core/current
daitss 14916 1 0 15:01 ? 00:00:00 Rack: /opt/web-services/sites/core/current
daitss 14925 1 0 15:01 ? 00:00:00 Rack: /opt/web-services/sites/core/current
daitss 14931 1 0 15:01 ? 00:00:00 Rack: /opt/web-services/sites/core/current
daitss 15871 14640 0 15:02 ? 00:00:01 Passenger ApplicationSpawner: /opt/web-services/sites/actionplan/current
daitss 16243 1 0 15:02 ? 00:00:00 Rack: /opt/web-services/sites/actionplan/current
daitss 16250 1 0 15:02 ? 00:00:00 Rack: /opt/web-services/sites/actionplan/current
daitss 16258 1 0 15:02 ? 00:00:00 Rack: /opt/web-services/sites/actionplan/current
daitss 16265 1 0 15:02 ? 00:00:00 Rack: /opt/web-services/sites/actionplan/current
daitss 16273 1 1 15:02 ? 00:00:02 Rack: /opt/web-services/sites/storage-master/current
daitss 16441 1 0 15:02 ? 00:00:02 Rack: /opt/web-services/sites/storage-master/current
daitss 16460 14640 0 15:02 ? 00:00:00 Passenger ApplicationSpawner: /opt/web-services/sites/transform/current
daitss 16466 1 0 15:02 ? 00:00:00 Rack: /opt/web-services/sites/transform/current
daitss 16473 1 0 15:02 ? 00:00:02 Rack: /opt/web-services/sites/storage-master/current
daitss 16485 1 0 15:02 ? 00:00:00 Rack: /opt/web-services/sites/transform/current
daitss 16491 1 0 15:02 ? 00:00:02 Rack: /opt/web-services/sites/silos/current
daitss 16501 1 0 15:02 ? 00:00:02 Rack: /opt/web-services/sites/silos/current
daitss 16520 1 0 15:02 ? 00:00:02 Rack: /opt/web-services/sites/storage-master/current
daitss 16528 1 0 15:02 ? 00:00:00 Rack: /opt/web-services/sites/transform/current
daitss 16531 14640 0 15:02 ? 00:00:00 Passenger ApplicationSpawner: /opt/web-services/sites/xmlresolution/current
daitss 16541 1 0 15:02 ? 00:00:00 Rack: /opt/web-services/sites/xmlresolution/current
daitss 16549 1 0 15:02 ? 00:00:00 Rack: /opt/web-services/sites/xmlresolution/current
daitss 16558 1 0 15:02 ? 00:00:00 Rack: /opt/web-services/sites/transform/current
daitss 16564 1 0 15:02 ? 00:00:02 Rack: /opt/web-services/sites/silos/current
daitss 16592 1 0 15:02 ? 00:00:00 Rack: /opt/web-services/sites/xmlresolution/current
daitss 16598 1 1 15:02 ? 00:00:02 Rack: /opt/web-services/sites/silos/current
daitss 16629 1 0 15:02 ? 00:00:00 Rack: /opt/web-services/sites/xmlresolution/current
lydiam 18520 21654 0 15:06 pts/0 00:00:00 grep daitss
daitss 32406 10421 0 04:04 ? 00:00:00 crond
daitss 32407 32406 0 04:04 ? 00:00:00 /bin/bash /usr/bin/run-parts /opt/fda/etc/daitss.daily
daitss 32411 32406 0 04:04 ? 00:00:00 /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t
daitss 32412 32411 0 04:04 ? 00:00:00 /usr/sbin/postdrop -r
daitss 32416 32407 0 04:04 ? 00:00:00 /bin/sh /opt/fda/etc/daitss.daily/run-collect-fixities
daitss 32417 32407 0 04:04 ? 00:00:00 awk -v progname=/opt/fda/etc/daitss.daily/run-collect-fixities progname {????? print progname ":\n"????? progname="";???? }???? { print; }
daitss 32418 32416 4 04:04 ? 00:31:51 collect-fixities
[lydiam@fclnx30 submit-direct]$
Ingest seems to be processing. Once all submitted packages are ingested I will attempt to submit one package at a time.
Now the AIP descriptor validation failed with the error we were getting on Friday:
descriptor fails daitss aip xml validation (1 errors)
unknown,: #
I reset the package and started manually. That worked on Friday.
Failed again with the same error. I will restart daitss again.
I restarted daitss and stopped pulse.
Restarted another package manually and it appears to be succeeding. Archived E30MZ6Y4F_63VOVM . Another package archived.
Other packages failed repeatedly at aip descriptor validation.
Gerald suggested that collect-fixities be killed. I did that and issued a stop-cron command: sudo -u daitss touch /opt/fda/etc/stop-cron.
I stopped and started daitss again, and the rack processes still show up.
Packages are failing repeatedly with multiple errors such as:
2015-12-14 15:55:14 -0500 message
descriptor fails daitss aip xml validation (1 errors) unknown,: #
trace
/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/template/descriptor.rb:49:in validate_aip_descriptor' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/ingest.rb:41:in
block in ingest'
/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/journal.rb:16:in step' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/ingest.rb:41:in
ingest'
/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:82:in block in spawn' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:66:in
fork'
/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:66:in spawn' /opt/web-services/sites/core/releases/20141117161651/app.rb:949:in
block in '
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1602:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1602:in
block in compile!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:in []' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:in
block (3 levels) in route!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:985:in route_eval' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:in
block (2 levels) in route!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1006:in block in process_route' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:in
catch'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:in process_route' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:964:in
block in route!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:963:in each' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:963:in
route!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1076:in block in dispatch!' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in
block in invoke'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in catch' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in
invoke'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1073:in dispatch!' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:in
block in call!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in block in invoke' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in
catch'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in invoke' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:in
call!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:886:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/commonlogger.rb:33:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:217:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:945:in
forward'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1019:in route_missing' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:980:in
route!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:976:in route!' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1076:in
block in dispatch!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in block in invoke' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in
catch'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in invoke' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1073:in
dispatch!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:in block in call!' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in
block in invoke'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in catch' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in
invoke'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:in call!' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:886:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/xss_header.rb:18:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/path_traversal.rb:16:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/json_csrf.rb:18:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/frame_options.rb:31:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/head.rb:11:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:180:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:2014:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/xss_header.rb:18:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/path_traversal.rb:16:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/json_csrf.rb:18:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/frame_options.rb:31:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/session/abstract/id.rb:225:in context' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/session/abstract/id.rb:220:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/head.rb:11:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/methodoverride.rb:21:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:180:in
call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:2014:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1478:in
block in call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1788:in synchronize' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1478:in
call'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/rack/request_handler.rb:97:in process_request' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_request_handler.rb:520:in
accept_and_process_next_request'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_request_handler.rb:274:in main_loop' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/rack/application_spawner.rb:206:in
start_request_handler'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/rack/application_spawner.rb:171:in block in handle_spawn_application' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/utils.rb:470:in
safe_fork'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/rack/application_spawner.rb:166:in handle_spawn_application' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server.rb:357:in
server_main_loop'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server.rb:206:in start_synchronously' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server.rb:180:in
start'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/rack/application_spawner.rb:129:in start' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/spawn_manager.rb:253:in
block (2 levels) in spawn_rack_application'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server_collection.rb:132:in lookup_or_add' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/spawn_manager.rb:246:in
block in spawn_rack_application'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server_collection.rb:82:in block in synchronize' :10:in
synchronize'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server_collection.rb:79:in synchronize' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/spawn_manager.rb:244:in
spawn_rack_application'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/spawn_manager.rb:137:in spawn_application' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/spawn_manager.rb:275:in
handle_spawn_application'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server.rb:357:in server_main_loop' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server.rb:206:in
start_synchronously'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/helper-scripts/passenger-spawn-server:99:in `
'
Reset and restarted another package and it seems to be working past the aip descriptor validation point.
Since packages aren't ingesting I tried to submit SIPs one at a time and got the java runtime error and a reject message: [lydiam@fclnx30 FTPDL_20151212T090803]$ sudo -b -u daitss submit --username lydiam --password fclad00d2 --source FTPDL --package /var/daitss/ops/incoming/ftpdl/FTPDL_20151212T090803/AA00025584_00046 --batch FTPDL_20151212T090803 [lydiam@fclnx30 FTPDL_20151212T090803]$ #
# pthread_getattr_np
# An error report file with more information is saved as:
# /var/daitss/ops/incoming/ftpdl/FTPDL_20151212T090803/hs_err_pid6107.log
It has been a while but I seem to remember we setup daitss to spawn multiple occurrences of specific web services to help with load. Such as more than one description service.
Some of the other processes listed above are spawned by daitss such as a mailer daemon and connection to the database. Also, anytime a user logs into daitss Apache will spawn a core process for that user. Not saying that there isn't something wrong with the number of daitss processes but its too hard for me to tell.
I would focus on the JVM error. One of two things must have occured for that error to generate. The system either ran out of Swap memory having gone through the other forms of memory first. Or the JVM settings need to be tweaked in the daitss config file.
Daitss relies on the JRE(Java Runtime Environment) for ruby gem RJB https://rubygems.org/gems/rjb/versions/1.5.4 to do xml_validation.
We use it here: https://github.com/daitss/core/blob/master/lib/daitss/proc/xmlvalidation.rb#L54
In the daitss config you will need to look at this line: https://github.com/daitss/core/blob/master/daitss-config.example.yml#L78
Xms2G - the starting heap size is 2GB Xmx2G - the max heap size is 2GB
Basically, every JVM spawned will take up 2GB of memory regardless of what is needed. You may want to decrease the starting heap size to Xms256m and increase the max heap size to Xmx4G. This could help with memory allocation. It will allow a single JVM process to grow larger than 2GB if needed but keep other JVM processes smaller. Keep in mind this is specific to the use of the JRE in daitss and not other processes on the daitss server using a JVM.
Something to think about - if there are other memory errors not relating to the JRE that can indicate a host of other potential problems. You will need a baseline. Stop daitss and look at the memory usage. Does it look normal?
Fire things up and see what memory looks like. It may be a good idea to disable GUI submission or better yet disable the UI completely. Submit packages from the command line and see what's going on with memory.
Current settings in /opt/web-services/conf.d/daitss-config.ymljjj
jvm_options:
describe.fda.fcla.edu: log_syslog_facility: LOG_LOCAL0 max_pdf_bitstreams: 100 pdfa_validator: /opt/pdfapilot/pdfaPilot -a $INPUT_FILE$ --noprogress --nohits --level=$LEVEL$ --report=XML,ALWAYS,PATH=$REPORT_FILE$ jvm_options:
(no Xms setting at all)
NWRDC has just rebooted fclxn30, and daitss processes and pulse started up. Pulse started ingest on
daitss 10884 10398 33 09:04 ? 00:05:15 E38C3ZIKV_YDQ7FX.ingest
Thats a 7GB+ package, so it will take some time to process. I've stopped pulse. I will not attempt submission on any new packages right now. There are no new packages ftp-ed to eclipsep yesterday.
Actual memory use is 6% and 0% swap, with physical 44.9%
I'm guessing that it's just trying to store the package, because all other steps were completed yesterday:
make aip descriptor 2015-12-14T14:03:49-05:00 112.63
validate aip descriptor 2015-12-14T14:05:41-05:00 007.29
make tarball 2015-12-14T14:05:49-05:00 070.93
I'm seeing some Passenger processes that are unfamiliar to me:
root 10322 10307 0 09:04 ? 00:00:00 PassengerWatchdog
root 10325 10322 1 09:04 ? 00:00:20 PassengerHelperAgent
root 10327 10325 0 09:04 ? 00:00:00 Passenger spawn server
This package errored out when DAITSS was shut down while it was storing.
Xms setting is optional. Leaving it off will start the JVM with a heap size of 0. This is okay as long as there is memory to be allocated when the JVM needs it. It will only allocate up to the Xmx setting.
I would run some of the packages that failed before and monitor the memory. If they succeed then its not really a problem with the JVM. It propbably means the JVM cannot allocate memory because there is none to be had. I'd be willing to bet fclxn30 needs more memory. If that is not an option I would look at increasing the swap size to compensate. It still may be a good idea to disable the GUI for a time.
There doesn't appear to be any current option to stop only the daitss GUI, so we'd require a developer to provide that option.
There is an option for the operator to disable GUI submission from the UI. Affiliates may still try to download/ disseminate packages.
A sysadmin should be able to setup a type of redirect for anyone outside of your organization. That would keep people from being able to connect to the GUI.
All packages currently in the workspace were manually restarted and successfully processed.
Retried submission of a single package that had failed yesterday.
Failed again, with the same symptoms:
[lydiam@fclnx30 UF00028293_00564]$ sudo -b -u daitss submit --username lydiam --password fclad00d2 --source FTPDL --package /var/daitss/ops/incoming/ftpdl/FTPDL_20151212T090803/AA00025584_00046
[sudo] password for lydiam:
[lydiam@fclnx30 UF00028293_00564]$ #
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 16 bytes for AllocateHeap
# An error report file with more information is saved as:
# /var/daitss/ops/incoming/ftpdl/FTPDL_20151212T090803/UF00028293_00564/hs_err_pid6121.log
Xymon reports memory usage at 6% actual.
Attempted to submit one of the FDA_benchmarks packages with the same results:
[lydiam@fclnx30 ETDs]$ sudo -b -u daitss submit --username lydiam --password fclad00d2 --source benchmarks --package /var/daitss/ops/FDA_benchmarks/ETDs/FDAE001612
[lydiam@fclnx30 ETDs]$ #
# pthread_getattr_np
# An error report file with more information is saved as:
# /var/daitss/ops/FDA_benchmarks/ETDs/hs_err_pid7425.log
GUI submission for the above benchmark package succeeded: E4P2B1PWE_5W0LH1
Downloading SIPs and submitting them via the GUI, where possible, and starting them manually.
Package AA00025584_00046, rejected 3 times via command-line submit, was successfully manually submitted via the GUI and successfully archived after manually starting the package.
There appears to be some problem with the command-line submit and pulse processes. What's the difference between a package started manually and one started by pulse?
Bill K. experimented with submit, submitting as Stephen's user and starting pulse as Stephen's user and also removed the memory limit from my user. I resubmitted the batch of records and they appear to have submitted successfully and are now being successfully processed.
So my biggest problem appears to have been the memory limit on my user. Bill thinks that there are multiple problems, including a possibly memory leak. More details to follow.
Bill's assessment of the problem:
We actually have 3 different issues. One is the memory problems upon submission. I think we have resolved this memory issue. There was a parameter in Lydia’s .bash_profile that was limiting her memory usage. This seems to have resolved this issue, but want to wait a little while longer before declaring victory.
The second issue, the EOF error, has been an intermittent problem for a while now according to Lydia. There are no filesystem, CPU or memory constraints on the server during the time the error occurs. It does not seem to depend on who submitted the package to the app, who started the app, or how long the server has been up.
The third is another memory issue, in which all of the server memory is consumed by the daitss processes. This looks to be a memory leak in the application.
I do not think the second and third problems are hardware or OS related. I have exhausted my limited Ruby skills looking at the problem.
I’ve mostly cleaned up the mess created from multiple submissions/rejects of UF records and submitted a batch of FIU records that were FTP-ed on Friday and am getting the same submit memory error again. Only 2 packages submitted successfully, and 18 got the same error:
sip path: /var/daitss/ops/incoming/nonUF_UFAD/FIU/FI15042500;
Invalid SIP descriptor. XML validation errors:
line:unknown: msg:#
Actual memory is at 21% and physical at 79%
[lydiam@fclnx30 ops]$ sudo -b -u daitss submit --username lydiam --password fclad00d2 --source FTPDL --path /var/daitss/ops/incoming/nonUF_UFAD/FIU --batch FTPDL_20151212T090803_FIU [sudo] password for lydiam: [lydiam@fclnx30 ops]$ #
# Native memory allocation (malloc) failed to allocate 16 bytes for AllocateHeap
# An error report file with more information is saved as:
# /var/daitss/ops/hs_err_pid25083.log
#
# There is insufficient memory for the Java Runtime Environment to continue.
# pthread_getattr_np
# An error report file with more information is saved as:
# /var/daitss/ops/hs_err_pid25106.log
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 41 bytes for AllocateHeap
# An error report file with more information is saved as:
# /var/daitss/ops/hs_err_pid25127.log
Assuming the same packages fail again and again with the same Java error and it doesn't appear the physical memory capped out maybe its time to try increasing the JVM memory setting.
Instead of -Xmx4096m try -Xmx8G. That will allow the JVM to grab more memory if needed. If it has no impact then switch it back.
Also, is there something about these packages that are in common? Stephen seemed to indicate one of the affiliates is sending large/ unusual packages. Are there a lot of .mov files in these packages?
Retried the submit command with a new session. I likely used an old session to do the submission, one that didn't register the memory changes that Bill K. made for my user.
It may be a worthwhile attempt to find out what user Apache runs as and determine if it has any of those same memory restrictions. The user will typically be "apache" or "nobody" and is usually configured in an httpd.conf file. It could even be configured to run as the "daitss" user.
I remember the EOF error is what the xml parser returns whenever the parser has a problem parsing an XML file. We used get this error with those invalid characters in the descriptors. The work done in submit has fixed most of the EOF, however, there could be some additional scenarios which can cause XML parsing to fail.
I submitted the following command on fclnx30:
The result was that 6-8 of about 144 SIPs submitted successfully and the remainder rejected with the following error:
The submit log contained the "standard" log messages, with the majority of packages being rejected and some being submitted.:
At the same time I go memory errors in STDOUT and logs named hs_err_pid*.log, with the following contents: [lydiam@fclnx30 ftpdl]$ more hs_err_pid15104.log #
There is insufficient memory for the Java Runtime Environment to continue.
--------------- T H R E A D ---------------
Current thread (0x000000001a667000): JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=15130, stack(0x00002b75b8ba2000,0x00002b75b8ca3000)]
Stack: [0x00002b75b8ba2000,0x00002b75b8ca3000], sp=0x00002b75b8c9d440, free space=1005k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x9e3ece] V [libjvm.so+0x9e45fb] V [libjvm.so+0x4dcf85] V [libjvm.so+0x2fdea0] ....
These logs can be found in: /var/daitss/ops/incoming/ftpdl
There are also lots of defunct processes:
daitss 27706 27690 0 12:53 pts/0 00:00:00 [ruby]
daitss 27727 27690 0 12:53 pts/0 00:00:00 [ruby]
daitss 27745 27690 0 12:53 pts/0 00:00:00 [ruby]
daitss 27766 27690 0 12:53 pts/0 00:00:00 [ruby]
daitss 27787 27690 0 12:53 pts/0 00:00:00 [ruby]
daitss 27809 27690 0 12:53 pts/0 00:00:00 [ruby]
And other daitss processes from Dec. 13:
daitss 23705 1 0 Dec13 ? 00:03:44 Rack: /opt/web-services/sites/core/current
daitss 23713 1 0 Dec13 ? 00:00:00 Rack: /opt/web-services/sites/core/current
daitss 23719 1 0 Dec13 ? 00:00:00 Rack: /opt/web-services/sites/core/current
daitss 23725 1 0 Dec13 ? 00:00:21 Rack: /opt/web-services/sites/core/current
daitss 23736 1 0 Dec13 ? 00:00:49 Rack: /opt/web-services/sites/actionplan/current
daitss 23743 1 0 Dec13 ? 00:00:00 Rack: /opt/web-services/sites/actionplan/current
daitss 23749 1 0 Dec13 ? 00:00:00 Rack: /opt/web-services/sites/actionplan/current
daitss 23755 1 0 Dec13 ? 00:00:17 Rack: /opt/web-services/sites/actionplan/current
daitss 23763 1 0 Dec13 ? 00:00:02 Rack: /opt/web-services/sites/storage-master/current
daitss 23773 1 0 Dec13 ? 00:00:02 Rack: /opt/web-services/sites/storage-master/current
daitss 23787 1 0 Dec13 ? 00:00:00 Rack: /opt/web-services/sites/transform/current
I will kill the defunct processes and attempt to restart DAITSS.
Gerald suggested attempting submission of one package at a time.