Open pcm32 opened 6 years ago
I'm still seeing this unfortunately....
@mvdbeek I remember you mentioned something that you wanted me to print the value for, I thought you wrote it here, but now I cannot find it. Sorry I only got back to this now. Thanks.
I've found it:
@pcm32 any chance you could log url in https://github.com/galaxyproject/galaxy/blob/4eed32dc4c86da17a525419ba985d4d06d7a768b/lib/galaxy/jobs/runners/cli.py#L43 ? (when it fails)
There is a log.debug on that same method (url_to_destination) that is not reached in the location that you mention, as I never see the message Converted URL ...
. However this other part:
gets executed, as I see the log message for Converted job from a URL to a destination and recovered
. On that execution path, self.dispatcher.url_to_destination
gets called (instead of maybe self.runner, which I guess would trigger the method that you are after).
Using the cli runner (LSF) with dynamic destinations, I notice that when the instance restarts the destination parameters of jobs that were left running or queued get scrambled. This leads to the the plugin parameter being lost and producing this error:
(line numbers might be a bit off, as I had some log statements here or there). This error brings down the entire check_watched_items loop of the runner, which in turns means that no new jobs are detected to be running (they get submitted fine to the scheduler and they run, but for Galaxy they stay in queued state).
My current hypothesis is that on restart, the URL of the job is used to reconstruct it, and in that process the parameters for the dynamic destination related jobs gets scrambled. You can notice after restart in the database that the job.destination_params field gets shortened to something like
\x7b7d
. Before restart, the same field in the DB, for the same job, looks a lot longer. So I guess the parameters should be somehow rescued from the database before the URL-reconstructed destination is persisted to the database on restart.The following error seems to be related to the url to destination issue: