Closed sebastian-luna-valero closed 6 years ago
Hi @sebastian-luna-valero, I will try and have a look at this tomorrow and get back to you
Thanks Nick.
Please let me know if you need help installing the new code.
Hi Sebastian,
What's the best way to install the new code without interfering with my current CGATOxford installation?
Thanks
Nick
Hi Nick,
Sure. First of all, clean up your environment:
# deactivate conda and check with "which conda"
source deactivate
# get rid of loaded modules and check with "module list"
module purge
# empty PYTHONPATH and check with "printenv $PYTHONPATH"
unset PYTHONPATH
I plan to automate the checks above within the installer at some point.
Then run the installer:
# get the installer
curl -O https://raw.githubusercontent.com/cgat-developers/cgat-flow/master/install-CGAT-tools.sh
# install everything
bash install-CGAT-tools.sh --devel --git-ssh --location <your-path>/cgat-developers-v0
Please make sure that you have at least 15GB of disk available in <your-path>
If everything goes smoothly, you should get:
The code successfully installed!
To activate the CGAT environment type:
$ source <your-data>/cgat-developers-v0/conda-install/etc/profile.d/conda.sh
$ conda activate base
$ conda activate cgat-f
To deactivate the environment, use:
$ conda deactivate
If that's the case, then please go to this branch by doing:
# go to repo
cd <your-path>/cgat-flow
# checkout branch
git checkout --track origin/migration-418
Finally, please run the pipeline with:
cgatflow rnaseqdiffexpression make full
Best regards, Sebastian
Hi Sebastian,
I got the following error:
An error occurred in:
command: cd $CGAT_HOME/cgat-flow
The script will abort now. User input was:
install-CGAT-tools.sh --devel --git-ssh --location cgat-developers-v0
Do I have to specify --location .
if installing into the current directory?
Thanks!
Nick
Hi Nick,
I recommend to use --location /full/path/to/installation/folder
Also, the installer should have printed out a list of environment variables at the bottom after Debugging
. If so, could you please paste those as well?
Best regards, Sebastian
will give it a go with the full path - I had to move my computer and so don't have a copy of the environment variables. Will paste them if it fails this time.
Thanks
Nick
Hi Sebastian,
The installation failed with the following:
##########################################################
An error occurred in:
command: $_CONDA_EXE "$cmd" "$@"
The script will abort now. User input was:
install-CGAT-tools.sh --devel --git-ssh --location /gfs/devel/nilott/cgat-developers-v0
Please copy and paste this error and report it via Git Hub: https://github.com/cgat-developers/cgat-flow/issues
Debugging: CFLAGS: CPATH: C_INCLUDE_PATH: /usr/include:/usr/X11R6/include:/usr/local/include CPLUS_INCLUDE_PATH: /usr/include:/usr/X11R6/include:/usr/local/include LIBRARY_PATH: /lib:/usr/lib:/usr/X11R6/lib:/usr/local/lib LD_LIBRARY_PATH: /lib:/usr/lib:/usr/X11R6/lib:/usr/local/lib CGAT_HOME: /gfs/devel/nilott/cgat-developers-v0 CONDA_INSTALL_DIR: /gfs/devel/nilott/cgat-developers-v0/conda-install CONDA_INSTALL_TYPE_CORE:core-devel.yml CONDA_INSTALL_TYPE_APPS: apps-devel.yml CONDA_INSTALL_TYPE_PIPELINES: pipelines-devel.yml CONDA_INSTALL_ENV: cgat-f PYTHONPATH: PIPELINES_BRANCH: master APPS_BRANCH: master CORE_BRANCH: master RELEASE: CODE_DOWNLOAD_TYPE: 2 INSTALL_IDE: 0 CLUSTER: 1
Thanks for the help
Nick
Hi Sebastian,
This was also in the output:
CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://repo.anaconda.com/pkgs/main/linux-64/mkl-2018.0.3-1.tar.bz2 Elapsed: -
An HTTP error occurred when trying to retrieve this URL. HTTP errors are often intermittent, and a simple retry will get you on your way.
Thanks, Nick.
Please see:
HTTP errors are often intermittent, and a simple retry will get you on your way.
That's annoying but the only solution is to remove the folder cgat-developers-v0
in /gfs/devel/nilott/cgat-developers-v0
and restart the installer again.
Hi Sebastian,
So the code installed! I have attempted to run: cgatflow rnaseqdiffexpression make full
with the following error:
Task = def pipeline_rnaseqdiffexpression.runFeatureCounts(...): \ Job = [[biopsy-HEALTHY-R1.bam, geneset_all.gtf.gz] -> [featurecounts.dir/biopsy-HEALTHY-R1/transcripts.tsv.gz, featurecounts.dir/biopsy-HEALTHY-R1/genes.tsv.gz]] \ \ Traceback (most recent call last): \ File "/gfs/devel/nilott/cgat-developers-v0/conda-install/envs/cgat-f/lib/python3.6/site-packages/ruffus/task.py", line 751, in run_pooled_job_without_exceptions \ register_cleanup, touch_files_only) \ File "/gfs/devel/nilott/cgat-developers-v0/conda-install/envs/cgat-f/lib/python3.6/site-packages/ruffus/task.py", line 567, in job_wrapper_io_files \ ret_val = user_defined_work_func(*params) \ File "/gfs/devel/nilott/cgat-developers-v0/cgat-flow/CGATPipelines/pipeline_rnaseqdiffexpression.py", line 690, in runFeatureCounts \ Quantifier.run_all() \ File "/gfs/devel/nilott/cgat-developers-v0/cgat-flow/CGATPipelines/PipelineRnaseq.py", line 241, in run_all \ self.run_transcript() \ File "/gfs/devel/nilott/cgat-developers-v0/cgat-flow/CGATPipelines/PipelineRnaseq.py", line 314, in run_transcript \ self.run_featurecounts(level="transcript_id") \ File "/gfs/devel/nilott/cgat-developers-v0/cgat-flow/CGATPipelines/PipelineRnaseq.py", line 306, in run_featurecounts \ P.run(statement) \ File "/gfs/devel/nilott/cgat-developers-v0/cgat-core/CGATCore/Pipeline/Execution.py", line 1328, in run \ benchmark_data = r.run(statement_list) \ File "/gfs/devel/nilott/cgat-developers-v0/cgat-core/CGATCore/Pipeline/Execution.py", line 932, in run \ job_path) \ File "/gfs/devel/nilott/cgat-developers-v0/cgat-core/CGATCore/Pipeline/Execution.py", line 859, in collect_single_job_from_cluster \ job_id, retval.exitStatus, "".join(stderr), statement)) \ OSError: --------------------------------------- \ Job 1043406 exited with error code 1: \ The stderr was: \ /etc/bashrc: line 12: PS1: unbound variable \ /gfs/work/nilott/proj018/analysis/biopsies/test_cgatflow/ctmpr11n_p1a.sh: line 20: /scratch/slurm_1043406/tmp.omfAohZKRg/geneset.gtf: Not a directory \ \ zcat geneset_all.gtf.gz > $TMPDIR/geneset.gtf; featureCounts -Q 10 -T 4 -s 0 -a $TMPDIR/geneset.gtf -o featurecounts.dir/biopsy-HEALTHY-R1/transcripts.tsv.raw -g transcript_id biopsy-HEALTHY-R1.bam >& featurecounts.dir/biopsy-HEALTHY-R1/transcripts.tsv.gz.log; gzip -f featurecounts.dir/biopsy-HEALTHY-R1/transcripts.tsv.raw \ ----------------------------------------- \ \
Hi Sebastian, From what I gather and correct me if I'm wrong, the way the code creates $TMPDIR is:
TMPDIR=`mktemp -p $SCRATCHDIR`
export $TMPDIR
This is enough in the run_featurecounts function to create the temporary geneset file i.e the statement should be changed from:
zcat genesetfile.gtf.gz > $TMPDIR/geneset.gtf; featurecounts ...
to
zcat genesetfile.gtf.gz > $TMPDIR; featurecounts ...
Removing the /geneset.gtf from the statement gets rid of the error and the output is as expected (although I do not have a test set for paired-end data and suspect the creation of bam_tmp will suffer from the same issue). I have to admit to finding the naming "TMPDIR" a little confusing in the code if indeed it is creating a file in SCRATCHDIR...
Does that all make sense?
P.S sorry its taken so long to get around to this
Hi Nick,
Many thanks for your help with this issue!
I think you need a special configuration for this to work properly in your cluster. Please see: https://github.com/cgat-developers/cgat-core/pull/30
@snsansom might be able to help you to get the cluster tmpdir correctly configured for your cluster.
If so, could you please share the relevant section of a working pipeline.yml
so I can add it to the docs?
Best regards, Sebastian
Thanks Sebastian, I will talk to Steve about this
Nick
Hi Sebastian,
Steve and I found a bug when cluster_tmpdir is set. I have modified the code. Shall I submit a PR on this branch?
We have a global configuration file that sets cluster_tmpdir in /etc/cgat/pipeline.yml
cluster:
tempdir: $SCRATCH_DIR
The pipeline runs without error now at my end.
Nick
Thanks, Nick and Steve.
Nick, I think we could use this PR so just need to push the changes to this branch and I will test it at my end as well.
Best regards, Sebastian
Actually Sebastian the bug is in cgat-core and not cgat-flow...Should I create a new branch and push the chagnes to that?
Sure, please do!
also should have been:
cluster:
tmpdir: $SCRATCH_DIR
sorry!
Thanks, Nick.
Could you please confirm which of the following is working well for you after the bug fix in cgat-core
:
zcat genesetfile.gtf.gz > $TMPDIR/geneset.gtf; featurecounts ...
or
zcat genesetfile.gtf.gz > $TMPDIR; featurecounts ...
Best regards, Sebastian
Hi Sebastian,
zcat genesetfile.gtf.gz > $TMPDIR/geneset.gtf; featurecounts ...
is working now so should be same as for CGAT setup
That's great!
Thank you very much!
Pipelines using
cgat-core
can now rely onTMPDIR
being correctly set per job so it can be directly used to work with temporary folders. xref: https://github.com/cgat-developers/cgat-core/pull/30ping @nickilott this PR is adding the changes commented in https://github.com/CGATOxford/CGATPipelines/pull/418 . I would be grateful if you could have a look. Would it be possible to run a test on your end to verify that it works as expected?