Snakemake-Profiles / lsf

Snakemake profile for running jobs on an LSF cluster
MIT License
36 stars 22 forks source link

🐛 Fix bug for legacy LSF system (v8.3) #35

Open haizi-zh opened 3 years ago

haizi-zh commented 3 years ago

In a legacy LSF system such as v8.3, the command bjobs does not support certain new arguments such as --noheader or -o. Furthermore, most of the time it is not realistic for snakemake users to upgrade LSF versions, which is tightly controlled by cluster admins. Therefore, it is essential to make this project compatible with legacy LSF systems. We can achieve this by simply using an alternative way to monitor job status.

This pull request includes a simple patch, which uses a plain bjobs command and for job status checking.

Note: I have tested the code under the following environment:

IBM Platform LSF 8.3.0.196409, May 10 2012

However, I haven't tested it for other LSF versions.

mbhall88 commented 3 years ago

Hi @haizi-zh , thanks for raising this issue and putting in a PR.

I am a little bit hesitant about trying to add support for legacy systems this old. It looks like these features -o and -noheader were introduced in v9.1.1 (released 2013), which itself is becoming legacy. This seems akin to supporting python2 in some respects (however v8.3 seems to be end-of-support for longer than python2).

I was able to dig up some old documentation for v8.0 (not v8.3 though), but again, it becomes very hard to debug issues on such an old system. If we were going to change the command to check the job status to the method you have outlined in this PR it will require a lot more work. We would need to update all tests and also add in some error handling for the case where the status cannot be obtained from the plain bjobs <jobID> command.

I appreciate your hands are tied and you can't just upgrade.

I might ask @leoisl and @johanneskoester for their thoughts on this also as I don't think it is appropriate for me to make such a decision on my own.

haizi-zh commented 3 years ago

Hi @haizi-zh , thanks for raising this issue and putting in a PR.

I am a little bit hesitant about trying to add support for legacy systems this old. It looks like these features -o and -noheader were introduced in v9.1.1 (released 2013), which itself is becoming legacy. This seems akin to supporting python2 in some respects (however v8.3 seems to be end-of-support for longer than python2).

I was able to dig up some old documentation for v8.0 (not v8.3 though), but again, it becomes very hard to debug issues on such an old system. If we were going to change the command to check the job status to the method you have outlined in this PR it will require a lot more work. We would need to update all tests and also add in some error handling for the case where the status cannot be obtained from the plain bjobs <jobID> command.

I appreciate your hands are tied and you can't just upgrade.

I might ask @leoisl and @johanneskoester for their thoughts on this also as I don't think it is appropriate for me to make such a decision on my own.

Hi @mbhall88 , thanks for the reply. I totally understand your concerns. If I were you I would definitely feel the same hesitation.

Unfortunately, my employer's HPC cluster is fairly old with lots of legacy codes running in it, thus upgrading to newer LSF is not something feasible. You don't have to merge the PR. I submitted it just in case some other guys may need it.

Wish you a great day! 😃

leoisl commented 2 years ago

I think is reasonable to not support legacy LSF versions. I am wondering if this PR should remain permanently open for legacy users to see it, or if we should close this PR, and add a section in README.md that points to this PR or to https://github.com/haizi-zh/lsf for users looking to run this profile on legacy LSF versions.