Closed cgirardot closed 6 years ago
Hello @cgirardot ! Thank you so much for this. I was actually looking at it lately but didn't finish it! This is amazing! Now the pipeline should be snakemake "workflow" approved!
Sadly the travis build failed. I'm not sure why though. It can't find the bash script's path. I see that you changed the path to the wrapper once already. I guess that the problem is linked to this. I'll also test it locally and try to figure it out.
Again, I am very grateful for your help on this!
this is weird.
I initially had the reference to the script as
dropseq_wrapper='../scripts/drop-seq-tools-wrapper.sh'
but this gave me the error we now see on failed build.
I therefore changed it to dropseq_wrapper='scripts/drop-seq-tools-wrapper.sh'
and the pipeline then worked like a charm (it is actually currently running) :-/
For example, I had :
Activating conda environment: /scratch/girardot/dropSeqPipe/dropSeqPipeTest/.snakemake/conda/8a6a88f2
/usr/bin/bash: ../scripts/drop-seq-tools-wrapper.sh: No such file or directory
(...) skipped for clarity
After changing ../scripts/drop-seq-tools-wrapper.sh
to scripts/drop-seq-tools-wrapper.sh
:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 100
Job counts:
count jobs
1 create_intervals
1 meta
2
localrule create_intervals:
input: /scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.dict, /scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.reduced.gtf
output: /scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.rRNA.intervals
jobid: 5
Activating conda environment: /scratch/girardot/dropSeqPipe/dropSeqPipeTest/.snakemake/conda/8a6a88f2
+ export '_JAVA_OPTIONS=-Djava.io.tmpdir=/tmp '
+ _JAVA_OPTIONS='-Djava.io.tmpdir=/tmp '
+ CreateIntervalsFiles REDUCED_GTF=/scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.reduced.gtf SEQUENCE_DICTIONARY=/scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.dict O=/scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen PREFIX=HUMAN_ZsGreen
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp
[Thu Aug 02 14:46:27 CEST 2018] org.broadinstitute.dropseqrna.annotation.CreateIntervalsFiles SEQUENCE_DICTIONARY=/scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.dict REDUCED_GTF=/scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.reduced.gtf OUTPUT=/scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen PREFIX=HUMAN_ZsGreen VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Thu Aug 02 14:46:27 CEST 2018] Executing as girardot@spinoza.embl.de on Linux 3.10.0-693.5.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b15; Picard version: 1.13(7bed8f4_1513008033)
[Thu Aug 02 14:47:44 CEST 2018] org.broadinstitute.dropseqrna.annotation.CreateIntervalsFiles done. Elapsed time: 1.31 minutes.
I am a bit confused with the use of ../
or omitting it. I see it is always present in your plot rules where a R script located in the ''script'' dir is called (which is why I first thought I'd need it) or when rules refer to conda envs. But it is not present when defining input/output files . I am not such a snakemake expert; if you could enlighten me as when to use it or not, maybe I could fix the issue.
Best
The paths in snakemake can be a bit confusing. Inputs and outputs are always relative to the main Snakefile.
The other paths, such as shell/scripts calls are relative to the snakefile
they are called from. Since all the rules are located in the rules
folder, all the calls have to come up one folder, hence the ../
for the
calls/envs.
I just tried your branch locally and I get the same error whether with or
without the ../
On Thu, Aug 2, 2018 at 10:37 AM, cgirardot notifications@github.com wrote:
For example, I had :
Activating conda environment: /scratch/girardot/dropSeqPipe/dropSeqPipeTest/.snakemake/conda/8a6a88f2 /usr/bin/bash: ../scripts/drop-seq-tools-wrapper.sh: No such file or directory (...) skipped for clarity
After changing ../scripts/drop-seq-tools-wrapper.sh to scripts/drop-seq-tools-wrapper.sh :
Building DAG of jobs... Using shell: /usr/bin/bash Provided cluster nodes: 100 Job counts: count jobs 1 create_intervals 1 meta 2
localrule create_intervals: input: /scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.dict, /scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.reduced.gtf output: /scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.rRNA.intervals jobid: 5
Activating conda environment: /scratch/girardot/dropSeqPipe/dropSeqPipeTest/.snakemake/conda/8a6a88f2
- export '_JAVA_OPTIONS=-Djava.io.tmpdir=/tmp '
- _JAVA_OPTIONS='-Djava.io.tmpdir=/tmp '
- CreateIntervalsFiles REDUCED_GTF=/scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.reduced.gtf SEQUENCE_DICTIONARY=/scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.dict O=/scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen PREFIX=HUMAN_ZsGreen Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp [Thu Aug 02 14:46:27 CEST 2018] org.broadinstitute.dropseqrna.annotation.CreateIntervalsFiles SEQUENCE_DICTIONARY=/scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.dict REDUCED_GTF=/scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen/HUMAN_ZsGreen.reduced.gtf OUTPUT=/scratch/girardot/dropSeqPipe/genome/Homo_sapiens.GRCh38.90.ZsGreen PREFIX=HUMAN_ZsGreen VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json [Thu Aug 02 14:46:27 CEST 2018] Executing as girardot@spinoza.embl.de on Linux 3.10.0-693.5.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b15; Picard version: 1.13(7bed8f4_1513008033) [Thu Aug 02 14:47:44 CEST 2018] org.broadinstitute.dropseqrna.annotation.CreateIntervalsFiles done. Elapsed time: 1.31 minutes.
I am a bit confused with the use of ../ or omitting it. I see it is always present in your plot rules where a R script located in the ''script'' dir is called (which is why I first thought I'd need it) or when rules refer to conda envs. But it is not present when defining input/output files . I am not such a snakemake expert; if you could enlighten me as when to use it or not, maybe I could fix the issue.
Best
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Hoohm/dropSeqPipe/pull/41#issuecomment-409949058, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNXaO60AGy36oLk1T5q00VFcEmaX3kVks5uMw64gaJpZM4VsRD_ .
--
Roelli Patrick Division of Animal Physiology and Immunology TUM School of Life Sciences Weihenstephan Technische Universität München Weihenstephaner Berg 3 85354 Freising GermanyBCF, Swiss Institute of BioinformaticsBâtiment Génopode, Quartier SorgeUniversity of Lausanne1015 LausanneSwitzerland
Could you run the same thing but using -p
? This would output the command line that is actually called. It might help
I think I understand what's happening.
Using shell
to run the script will run from the "working dir".
This means that if you clone the repo, add the data to this same folder and run it without anything else, it will find the path properly.
If you use the --directory
argument and point towards another folder containing only the data, it won't find the script because they are not there. (This is what's happening on travis-ci with the test data)
One "easy" fix would be to add the bash script to the environment's paths. I guess this could be done in the build.sh from the conda repo?
If I well understand, you are suggesting to pack your script in the dropseq-tools conda package. Correct? If so, I would not go down this path as the conda recipe uses dropseq tools as packaged from authors and one does not want to maintain another packaging process. I have another idea. I can get rid of your script altogether. This should be quite easy. I actually did this first but changed to the current solution. I 'll do this tomorrow and submit another PR.
I understand what you mean. The only reason for the bash wrapper was to pass down memory and temp paths. As long as this can be done differently, the wrapper is not required.
memory adjustment is already supported by the dropseq wrappers and tmp path can be injected using the _JAVA_OPTION. It should be straightforward. Another approach is to use absolute path to your wrapper script if snakemake defines some variable but I am not snakemake fluent enough.
It crashed but it's because of a conflict between some R packages. I'm fixing it right now.
my pipeline now successfully completed using this code. I did not see the crash. I actually did not know every commit would be picked up directly !
You would not have encountered the conflict because the env for the plotting didn't change in the envs/plots.yaml file. Here it crashed because of some update on the conda package side. I haven't fixed all the packages versions yet. That's on me.
You didn't rerun the meta_generation, so you missed a few temp parameters. I've fixed those as well.
is there a way to run the pipeline with test data locally ? It is indeed taking ages the way I do it and I therefore avoid re-running it all. I also did not realise in the first place that my commits would be picked up directly : I develop on my laptop and push so I can pull from the server and run the tests (but in the meantime travis seems to wake up). Sorry I need to understand the PR lifecycle better.
You can definitely use the test data from this https://github.com/Hoohm/scngs-test-data/tree/fa6f142da9da740b68f27fceb2dde942ef43d696 git repo to test.
I'm gonna update this repo as well since I don't need the wrapper anymore.
The test data is on a git submodule in the .test folder.
The merged version passed also! Thanks a lot for that. This will help a lot in the future. If you let me know if you have any other general comments about dropSeqPipe that you would want to talk about.
thanks! It is a great pipeline!
On 3. Aug 2018, at 19:31, Patrick Roelli notifications@github.com wrote:
The merged version passed also! Thanks a lot for that. This will help a lot in the future. If you let me know if you have any other general comments about dropSeqPipe that you would want to talk about.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Did you take the time to announce the conda package availability to James Nemesh?
If not, I think it would be a good idea. I can do it if you wish. There is a dropseq mailing list as well that might be interested.
no I haven’t. This is indeed a good idea. I can t find James Nemesh email on the Mc Carroll lab. If you could let him know and also the ML you r mentioning, this woudl be great. Thx
On 6. Aug 2018, at 21:37, Patrick Roelli notifications@github.com wrote:
Did you take the time to announce the conda package availability toJames Nemesh?
If not, I think it would be a good idea. I can do it if you wish. There is a dropseq mailing list as well that might be interested.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Hi @Hoohm , I am using your nice pipeline to process dropseq and decided to contribute on one of your TODO. I have added drop-seq-tools in conda and adapted your pipeline accordingly. The modifications are as follows:
_JAVA_OPTIONS
I also changed the following :
I hope you'll like the changes