galaxyproject / pulsar

Distributed job execution application built for Galaxy
https://pulsar.readthedocs.io
Apache License 2.0
37 stars 50 forks source link

memory_statement.log doesn't respect metadata directory #277

Closed jmchilton closed 3 years ago

jmchilton commented 3 years ago

Job metrics files are being transferred from metadata but memory_statement.log setup by the job script code and templates isn't using the same directory:

galaxy.jobs.runners.pulsar DEBUG 2021-07-20 23:38:50,731 [pN:main,p:3418,tN:PulsarJobRunner.monitor_thread] (413) Received status update: <class 'str'> postprocessing
pulsar.client.staging.down INFO 2021-07-20 23:38:50,740 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output None with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/objects/5/e/3/dataset_5e3c3904-3652-46e6-a251-df38c06cb387.dat,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,741 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_workdir file memory_statement.log
pulsar.client.staging.down INFO 2021-07-20 23:38:50,741 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output memory_statement.log with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/working/memory_statement.log,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,741 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_workdir file 10.txt
pulsar.client.staging.down INFO 2021-07-20 23:38:50,741 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output 10.txt with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/working/10.txt,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,741 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_workdir file 2.txt
pulsar.client.staging.down INFO 2021-07-20 23:38:50,741 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output 2.txt with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/working/2.txt,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,741 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_workdir file 3.txt
pulsar.client.staging.down INFO 2021-07-20 23:38:50,741 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output 3.txt with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/working/3.txt,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,741 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_workdir file 7.txt
pulsar.client.staging.down INFO 2021-07-20 23:38:50,742 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output 7.txt with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/working/7.txt,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,742 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_workdir file 5.txt
pulsar.client.staging.down INFO 2021-07-20 23:38:50,743 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output 5.txt with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/working/5.txt,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,743 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_workdir file 8.txt
pulsar.client.staging.down INFO 2021-07-20 23:38:50,743 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output 8.txt with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/working/8.txt,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,743 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_workdir file 1.txt
pulsar.client.staging.down INFO 2021-07-20 23:38:50,743 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output 1.txt with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/working/1.txt,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,743 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_workdir file 4.txt
pulsar.client.staging.down INFO 2021-07-20 23:38:50,743 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output 4.txt with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/working/4.txt,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,743 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_workdir file 9.txt
pulsar.client.staging.down INFO 2021-07-20 23:38:50,743 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output 9.txt with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/working/9.txt,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,743 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_workdir file 6.txt
pulsar.client.staging.down INFO 2021-07-20 23:38:50,744 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output 6.txt with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/working/6.txt,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,744 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_metadata file __instrument_core_galaxy_slots
pulsar.client.staging.down INFO 2021-07-20 23:38:50,744 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output __instrument_core_galaxy_slots with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/metadata/__instrument_core_galaxy_slots,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,744 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_metadata file __instrument_core_epoch_start
pulsar.client.staging.down INFO 2021-07-20 23:38:50,745 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output __instrument_core_epoch_start with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/metadata/__instrument_core_epoch_start,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,745 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_metadata file __instrument_core_epoch_end
pulsar.client.staging.down INFO 2021-07-20 23:38:50,745 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output __instrument_core_epoch_end with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/metadata/__instrument_core_epoch_end,action_type=copy]
pulsar.client.staging.down DEBUG 2021-07-20 23:38:50,745 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting dynamic output_metadata file __instrument_core_galaxy_memory_mb
pulsar.client.staging.down INFO 2021-07-20 23:38:50,745 [pN:main,p:3418,tN:[manager=_default_]-[action=postprocess]-[job=413]] collecting output __instrument_core_galaxy_memory_mb with action FileAction[path=/tmp/tmpzkwlfog8/tmpr6e3fl_8/tmp01rpql55/database/job_working_directory_8dtz__td/000/413/metadata/__instrument_core_galaxy_memory_mb,action_type=copy]

This can cause dynamic dataset destination code to fail that expects a clean directory.

My guess is the job script code is being called with wd being the job directory in the Galaxy case and the tool's clean working directory in the Pulsar case. Since the job metrics code is being instrumented to handle either - I think the memory statement code ought to be as well.