UCSF-Costello-Lab / LG3_Pipeline

The original LG3 pipeline
https://github.com/UCSF-Costello-Lab/LG3_Pipeline
0 stars 0 forks source link

BREAKING/BUG: $LG3_HOME/lg3.conf prevents env vars to be set in run scripts #128

Closed HenrikBengtsson closed 5 years ago

HenrikBengtsson commented 5 years ago

In the develop branch, $LG3_HOME/lg3.conf has been updated such that it sets variables like:

CONV=${LG3_HOME:?}/patient_ID_conversions.tsv

This means that when, say, runs_demo/_run_Pindel, tries to set this variable to:

CONV=${CONV:-patient_ID_conversions.tsv}

it already set. This means that it will not pick up ./patient_ID_conversions.tsv in the users working directory (=project folder) but the hard-coded one that comes with the pipeline.

This was caught when I was trying to run check the pipeline:

$ more _Pindel_Patient157t10.out
[...]
Input:
- PATIENT=Patient157t10
- PROJECT=LG3
- CONV=/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-devel/patient_ID_conversions.tsv
ERROR: No such file: '/cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-devel/patient_ID_conversions.tsv' (working directory '/cbc2/data2/henrik/re
positories/UCSF-CostelloLab/test-devel-20190419-t10-pindel_0.2.5b8')
Traceback:
1: assert_file_exists() on line #157 in /cbc2/data2/henrik/repositories/UCSF-CostelloLab/LG3_Pipeline-devel/scripts/utils.sh
2: main() on line #57 in /var/spool/torque/mom_priv/jobs/1334861.cclc01.som.ucsf.edu.SC
Exiting (exit 1)
HenrikBengtsson commented 5 years ago

@ivan108, I'm assigning this one along to you, especially since you've made those changes since the last release. Make sure to update so that Protection #129 is applied - that will help robustifying the development.

ivan108 commented 5 years ago

Yes, it is a bug, good catch!

The correct line in $LG3_HOME/lg3.conf should read:

CONV=patient_ID_conversions.tsv

Will do some more tests...

I also propose that we never pass the same parameter by lg3.conf files AND environment. Only runtime parameters which change with every run should be passed by environment, e.g. PATIENT..

PS: I am fine with working outside of $LG3_HOME, in that way I will have the same experience as other LG3 users.

HenrikBengtsson commented 5 years ago

I also propose that we never pass the same parameter by lg3.conf files AND environment.

I agree consistency is useful, but we cannot remove support for env vars until lg3.conf is fully implemented, e.g. Issue #127. Until then, lg3.conf should not set those if already set (as env var). Better to make small, conservative, backward compatible changes and then slowly deprecate support for old ways of calling the pipeline. You don't want to mess up end users existing script from one release to the other.

HenrikBengtsson commented 5 years ago

@ivan108, have you had a chance to fix/undo this?

ivan108 commented 5 years ago

I fixed lg3.conf file in current develop branch.

Meanwhile, I am working on refactoring the code to use lg3.conf files in all steps of the pipeline. I don't like hybrid setup when the same parameter sometimes is taken from config file and sometimes not. It is confusing, so I am not making intermediate commits, but do a lot of testing at every step. Then I will commit it all at once. In that case it would be easy to roll back if we don't like it..

HenrikBengtsson commented 5 years ago

Setting

CONV=patient_ID_conversions.tsv

in lg3.conf won't solve it either. As it stands now, we cannot really set it at all, because it prevents the user from setting env var CONV outside. This is because in the scripts etc we use:

source_lg3_conf   ## <= here CONV is set regardless what the use sets outside ...
[...]
CONV=${CONV:-patient_ID_conversions.tsv}   ## <= ... so, this will do nothing

To fix this, anything that should be allowed to be set via env vars needs to be set conditionally in the lg3.conf file, e.g.

CONV=${CONV:-patient_ID_conversions.tsv}

This won't override CONV is already set.

HenrikBengtsson commented 5 years ago

Meanwhile, I am working on refactoring the code to use lg3.conf files in all steps of the pipeline.

FYI, the best/safest way to do this, is to work in a separate Git branch, e.g.

$ git checkout develop
$ git checkout -b feature/lg3.conf

and do your commits to that new 'feature/lg3.conf' branch without risking messing up the 'develop' branch. You can even push it to GitHub. When you're happy with 'feature/lg3.conf' and it's been well tested, we can do:

$ git checkout develop
$ git merge feature/lg3.conf

to bring it all into the 'develop' branch. Then run another round of testing before releasing (to 'master' branch)

ivan108 commented 5 years ago

To continue discussion about best way to setup config files:

[...]

MOVED 2019-05-02: @HenrikBengtsson moved this comment to https://github.com/UCSF-Costello-Lab/LG3_Pipeline/issues/130, because it is likely to be a quote long discussion. We can then continue the Pindel-specific settings here.

HenrikBengtsson commented 5 years ago

@ivan108, can I take a stab at fixing this? I'll try to do a minimal fix - I hope I can do it with messing too much with your plans on lg3.conf.

@shuntsman-ucsf and @eladziv are ready & pretty keen to process their data with Pindel2.

ivan108 commented 5 years ago

Yes, go ahead with the fix, thanks!

I slowed down testing lg3.conf branch (origin/feature/lg3.conf), which already looks good, because I am in the middle of processing a large exome batch...

HenrikBengtsson commented 5 years ago

Fixed.