UCSF-Costello-Lab / LG3_Pipeline

The original LG3 pipeline
https://github.com/UCSF-Costello-Lab/LG3_Pipeline
0 stars 0 forks source link

Initial setup with new release #73

Closed ivan108 closed 5 years ago

ivan108 commented 5 years ago

Testing initial steps setting up pipeline in arbitrary directory:

cd /home/jocostello/shared
mkdir LG3_Pipeline_test2
cd LG3_Pipeline_test2
ln -s /data/jocostello/LG3_Pipeline_test output

module load CBC lg3

lg3 test setup
*** Setup
[OK] PROJECT: LG3
[OK] PATIENT:  (required for 'lg3 test validate')
[ERROR] CONV: patient_ID_conversions.tsv (no such file)
grep: patient_ID_conversions.tsv: No such file or directory
[OK]   => SAMPLES:  (required by '_run_Recal')
grep: patient_ID_conversions.tsv: No such file or directory
[OK]   => NORMAL: '' (required by '_run_Recal')
[OK] EMAIL: ivan.smirnov@ucsf.edu
[OK] LG3_HOME: /home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-08
[OK] LG3_OUTPUT_ROOT: /costellolab/data1/jocostello
[OK] Patient TSV file: patient_ID_conversions.tsv
[OK] Raw data folder: rawdata
[OK] Run scripts: _run_Align_gz
[OK] Run scripts: _run_Merge
[OK] Run scripts: _run_Merge_QC
[OK] Run scripts: _run_MutDet
[OK] Run scripts: _run_Pindel
[OK] Run scripts: _run_PostMut
[OK] Run scripts: _run_Recal
[OK] Run scripts: _run_Recal_pass2
[OK] Run scripts: _run_Trim
[OK] R packages: 'RColorBrewer
ls -1
output
patient_ID_conversions.tsv
rawdata
_run_Align_gz
_run_Merge
_run_Merge_QC
_run_MutDet
_run_Pindel
_run_PostMut
_run_Recal
_run_Recal_pass2
_run_Trim

Issues:

  1. Strange error message, the link to patient_ID_conversions.tsv is actually created.
  2. No default PATIENT, and the current instructions doesn't require to export Patient.
  3. No default SAMPLES and NORMAL, this makes sense since PATIENT is not specified.
  4. LG3_OUTPUT_ROOT doesn't point to "output", which was created before.

Am I using the wrong instructions? I am using develop branch on github ..

HenrikBengtsson commented 5 years ago
  1. Strange error message, the link to patient_ID_conversions.tsv is actually created.

Yes, I also noticed that. Though, if one run lg3 test setup a second later, the error is no longer there. It seems to be related to a delay in the file system. I guess we could add some forgiveness to this.

  1. No default PATIENT, and the current instructions doesn't require to export Patient.

It's not needed by lg3 test setup, but it reports on it/them if set. We should harmonize how to specify PATIENT, e.g. in most places we do it via env var PATIENT but for some scripts (e.g. lg3) we pass it as CLI arguments or options (--patient=...).

  1. No default SAMPLES and NORMAL, this makes sense since PATIENT is not specified.

Yes, I didn't spend too much time on that part. The plan is to get rid of SAMPLES and NORMAL and pull that information from (PATIENT, CONV), cf. Issue #56.

  1. LG3_OUTPUT_ROOT doesn't point to "output", which was created before.

This is on purpose. With the 2018-10-08 version, I no longer have module load lg3 set LG3_OUTPUT_ROOT. It should not be needed anymore; all the scripts will cause it to default to output/. I've created Issue #74 for this.

[...] I am using develop branch on github ..

You're actually using the most recent release (2018-10-08) because module load lg3, which is confirmed by:

[OK] LG3_HOME: /home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-08
ivan108 commented 5 years ago

OK, so I am trying to continure: export PATIENT=Patient157t10

lg3 test setup
*** Setup
[OK] PROJECT: LG3
[OK] PATIENT: Patient157t10 (required for 'lg3 test validate')
[OK] CONV: patient_ID_conversions.tsv
[OK]   => SAMPLES: Z00599t10 Z00600t10 Z00601t10  (required by '_run_Recal')
[OK]   => NORMAL: 'Z00599t10' (required by '_run_Recal')
[OK] EMAIL: ivan.smirnov@ucsf.edu
[OK] LG3_HOME: /home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-08
[OK] LG3_OUTPUT_ROOT: /costellolab/data1/jocostello
[OK] Patient TSV file: patient_ID_conversions.tsv
[...]

Good, SAMPLES and NORMAL are now set.

_run_Trim
[2018-10-08 15:24:25 PDT] BEGIN: ./_run_Trim
Call: ./_run_Trim
Script: ./_run_Trim
Arguments:
Input:
- LG3_HOME=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-08
- PROJECT=LG3
- LG3_INPUT_ROOT=rawdata
- LG3_OUTPUT_ROOT=/costellolab/data1/jocostello
- EMAIL=ivan.smirnov@ucsf.edu
- TG=
- SAMPLES=Z00599t Z00600t Z00601t
Qsub extras:
- QSUB_OPTS= -d /home/jocostello/shared/LG3_Pipeline_test2 -M ivan.smirnov@ucsf.edu
- QSUB_ENVVARS=LG3_HOME=/home/shared/cbc/software_cbc/LG3_Pipeline-2018-10-08,LG3_OUTPUT_ROOT=/costellolab/data1/jocostello,EMAIL=ivan.smirnov@ucsf.edu
ETA ~4h
Submitting Z00599t: quality 20 ...
1225294.cclc01.som.ucsf.edu
Submitting Z00600t: quality 20 ...
1225295.cclc01.som.ucsf.edu
Submitting Z00601t: quality 20 ...
1225296.cclc01.som.ucsf.edu
[2018-10-08 15:24:25 PDT] END: ./_run_Trim

It uses the wrong [default] SAMPLES!!!

HenrikBengtsson commented 5 years ago

They're actually not set; lg3 test setup is only reporting the pairing of (PATIENT, CONV). So, yes, you need to set SAMPLES (until we've resolved Issue #56).

PS. It's not possible for a software (here lg3 or lg3-test) to set env vars in the calling shell (thankfully; imagine how many poorly written tools would modify PATH etc). For that to happen, we would have to define lg3 as a Bash function (that's how module does it), or "source" it, e.g. source /path/to/lg3 test setup (or an alias around the latter).

ivan108 commented 5 years ago

I see now that SAMPLES/NORMAL are not set, but PATIENT is, so I guess we need to set default SAMPLES/NORMAL from conversion file everywhere, as we already started doing (#56 ) ...

HenrikBengtsson commented 5 years ago

... I guess we need to set default SAMPLES/NORMAL from conversion file everywhere ...

Yes, I think that's the sanest way going forward. Any other attempts using external env vars will just end up in convoluted code.

But if you wish, you could start out switching the current run scripts defaults to test Patient157t10 instead of Patient157t (Issue #68, I assume). That shouldn't take too much work and would probably allow us to move on Issue #56 faster.

ivan108 commented 5 years ago

OK, I finished _run_Trim without errors, but all output ended up in the wrong old location. It ignored my "output" link....

ivan108 commented 5 years ago

Yes, we should use the quickest run as default, and optionally the user my run longer test later.

HenrikBengtsson commented 5 years ago

... output ended up in the wrong old location.

ambiguous

ivan108 commented 5 years ago

When _run_Trim was launched, it displayed the right output location, corresponding to "output" link:

[...]
Input:
[...]
- LG3_OUTPUT_ROOT=/data/jocostello/LG3_Pipeline_test
[...] 

but in the after run logs (_Trim_Z00599t10.out) it shows different location (our old default)

[...]
Settings:
[...]
LG3_OUTPUT_ROOT=/costellolab/data1/jocostello
[...]

and that is were the output files ended up!! How is that possible??

HenrikBengtsson commented 5 years ago

How is that possible??

You must be setting it somewhere in your shell startup scripts, because that path is nowhere in the code:

$ git checkout master
$ grep -F /costellolab/data1/jocostello *.pbs scripts/*.sh
$ 

As a reference, see the output of the tests I ran yesterday where LG3_OUTPUT_ROOT is unset:

$ grep -F LG3_OUTPUT_ROOT /home/cbctest2/repositories/testing-devel-20181007a-t10/_Trim*.out
/home/cbctest2/repositories/testing-devel-20181007a-t10/_Trim_Z00599t10.out:- LG3_OUTPUT_ROOT=/cbc/cbctest2/testing-devel_20181007a-t10/output
/home/cbctest2/repositories/testing-devel-20181007a-t10/_Trim_Z00599t10.out:- LG3_OUTPUT_ROOT=/cbc/cbctest2/testing-devel_20181007a-t10/output
/home/cbctest2/repositories/testing-devel-20181007a-t10/_Trim_Z00600t10.out:- LG3_OUTPUT_ROOT=/cbc/cbctest2/testing-devel_20181007a-t10/output
/home/cbctest2/repositories/testing-devel-20181007a-t10/_Trim_Z00600t10.out:- LG3_OUTPUT_ROOT=/cbc/cbctest2/testing-devel_20181007a-t10/output
/home/cbctest2/repositories/testing-devel-20181007a-t10/_Trim_Z00601t10.out:- LG3_OUTPUT_ROOT=/cbc/cbctest2/testing-devel_20181007a-t10/output
/home/cbctest2/repositories/testing-devel-20181007a-t10/_Trim_Z00601t10.out:- LG3_OUTPUT_ROOT=/cbc/cbctest2/testing-devel_20181007a-t10/output
ivan108 commented 5 years ago

Brilliant!! It was in .bashrc, and I totally forgot about it...

HenrikBengtsson commented 5 years ago

The proposed warning in Issue #74 intends to cover exactly this mistake.

HenrikBengtsson commented 5 years ago

Did you successfully complete the tests you ran here? Can we close this issue?

ivan108 commented 5 years ago

Yes

HenrikBengtsson commented 5 years ago

Minor comment on:

Strange error message, the link to patient_ID_conversions.tsv is actually created.

Yes, I also noticed that. Though, if one run lg3 test setup a second later, the error is no longer there. It seems to be related to a delay in the file system. I guess we could add some forgiveness to this.

I identified the problem - lg3 test setup was trying to parse the file before linking to the test template file. I've fixed this (but yet to commit and push).