databio / bulker

Manager for multi-container computing environments
https://bulker.io
BSD 2-Clause "Simplified" License
24 stars 2 forks source link

Pipeline commands hanging in bulker environment #92

Open rcorces opened 9 months ago

rcorces commented 9 months ago

Our cluster recently updated its OS to Rocky8 and this has broken a lot of our analytical pipelines. I cant be certain that this is OS-related but it could be. From the various github repos, I can see that you guys are actively updating a lot of this ecosystem so I'm ok not answering this problem right now and trying again once all of those updates are complete.

When running PEPATAC through Bulker (old v1.0.9), multiple steps in the pipeline no longer work and some commands just hang without doing anything. I've reloaded all of my Bulker crates but this did not solve the problem.

The first obvious and unfortunate problem is that the PEPATAC_log.md file is not getting written to while the pipeline is running. I assume this is actually a Bulker issue and due to some command not working as expected.

As an example of one of the problems, grep commands would hang only when the stdout was directed to a file. So if I activated the bulker crate and did echo 123 | grep 1 >file.txt this would hang endlessly but it would execute fine if the output wasnt directed to a file. I fixed this by reloading the alpine/coreutils crates without grep and forcing them to use grep as a host command. This grep error was reproducible from command line after activating the crate (ie didnt just happen from within the pipeline)

A second issue came up later in the pipeline when generating the fragment length distribution (either when running an Rscript or when running a sort | uniq >file.txt command, hard to tell because the only info I have to go off of is the PEPATAC_commands.sh file). In this case, I'm able to execute the corresponding commands manually from command line after activating the bulker crate on a development node but they dont seem to work from within the pipeline on the compute node.

Lastly, we are able to get the pipeline working just fine using the singularity implementation instead of bulker.

I know that this isnt enough information for you to solve the problem but I'm hoping it is enough information for you to opine on what might be the problem. I dont think it is an issue with volumes being mounted because most of the pipeline works just fine. Thanks for any help you can provide - or just tell me to wait until updates are released.