galaxyproject / tools-iuc

Tool Shed repositories maintained by the Intergalactic Utilities Commission
https://galaxyproject.org/iuc
MIT License
161 stars 430 forks source link

gffread can create TB of stdout files #5028

Open bgruening opened 1 year ago

bgruening commented 1 year ago
galaxy@sn06:/data/jwd04/main/053/584/53584927$ ls -lh outputs/
total 15T
-rw-r--r-- 1 galaxy galaxy   7 Dec 26 19:30 COMMAND_VERSION
-rw-r--r-- 1 galaxy galaxy   0 Dec 26 19:30 tool_stderr
-rw-r--r-- 1 galaxy galaxy 22T Dec 29 14:46 tool_stdout

We should probably redirect the stdout to dev/null in this case. @natefoo is Galaxy supposed to track those files and kill those jobs with TB out stdout and stderr?

bernt-matthias commented 1 year ago

Any idea of the content?

Maybe report upstream if you think it's too verbose?

bgruening commented 1 year ago

It copies the entire input sequences into stdout over and over again.

bernt-matthias commented 1 year ago

OK. That seems useless. Let's report upstream and redirect with a comment mentioning the upstream issue?

natefoo commented 1 year ago

For the record, Galaxy does not track or act on stdout/stderr size, although it will limit what gets stored in the database IIRC. Output size limiting only ever worked on defined outputs in Galaxy only (not Pulsar). It'd be difficult from a performance standpoint to track anything else (du on the working dir, for example, could be incredibly expensive if it's a large number of small files), and if we did do anything I'd like to see it as something launched in the job script.

bgruening commented 1 year ago

@natefoo do you think it would be worthwhile to track stdout and stderr? If so I can create an issue.

natefoo commented 1 year ago

Yeah, I think we should be able to do that fairly reliably.

bernt-matthias commented 1 year ago

@bgruening can you recover the/a commandline .. seems that I'm unable to reproduce it with our tests https://github.com/galaxyproject/tools-iuc/pull/5293

But I guess that redirecting to /dev/null in an else branch here might be an option .. but a test would be nice in the first place.

bgruening commented 1 year ago

@sanjaysrikakulam can you please try to recover the commandline from the DB. The Job number you will find in the issue above. Thanks.

sanjaysrikakulam commented 1 year ago

For this particular job, the command_line was

gffread '/data/dnb06/galaxy_db/files/9/f/e/dataset_9fef7676-4bda-427c-9f76-205777411e5a.dat' -m '/data/dnb07/galaxy_db/files/a/f/2/dataset_af251307-842b-40c5-8ab8-bdfc23461deb.dat

Tool id: toolshed.g2.bx.psu.edu/repos/devteam/gffread/gffread/2.2.1.3+galaxy0 Galaxy version: 22.05

bernt-matthias commented 1 year ago

Thanks @sanjaysrikakulam and @bgruening .. the funny thing is that with this configuration Galaxy does not even produce an output.

What I absolutely can not explain is the size of stdout.

Please check https://github.com/galaxyproject/tools-iuc/pull/5293