Closed jscaber closed 5 years ago
PS: When reverting back to the CGATOxford code, the task runs fine.
I have rewritten the code so it now inserts && instead of ; to join statments (#52), but this does not work either (cannot change the ";" before and after tempdir):
# 2018-09-12 11:21:57,245 INFO {"task": "pipeline_rnaseqqc.subsetSequenceData", "statement": "mkdir -p /ifs/projects/proj096/rnaseqqc/ctmpm5qc31vq; zcat SMA-0303-3.fastq.1.gz | awk 'NR > 4000000 {exit} {print}' | gzip > fastq.dir/SMA-0303-3.fastq.1.gz && zcat SMA-0303-3.fastq.2.gz | awk 'NR > 4000000 {exit} {print}' | gzip > fastq.dir/SMA-0303-3.fastq.2.gz ; rm -rf /ifs/projects/proj096/rnaseqqc/ctmpm5qc31vq", "hostname": "cgat014.anat.ox.ac.uk", "job_id": "0", "engine": "GridExecutor", "submit_time": 1536747675.1106207, "start_time": 1536747675.1106207, "end_time": 1536747717.2442586, "slots": 1.0, "exit_status": 0.0, "total_t": 42.13363790512085, "cpu_t": 0.0, "wall_t": 0.0, "user_t": 0.0, "sys_t": 0.0, "child_user_t": 0.0, "child_sys_t": 0.0, "shared_data": 0.0, "io_input": 0.0, "io_output": 0.0, "average_memory_total": 0.0, "percent_cpu": 0.0, "average_rss": 0.0, "max_rss": 0.0, "max_vmem": 0.0, "minor_page_faults": 0.0, "swapped": 0.0, "context_switches_involuntarily": 0.0, "context_switches_voluntarily": 0.0, "average_uss": 0.0, "signal": 0.0, "socket_received": 0.0, "socket_sent": 0.0, "major_page_faults": 0.0, "unshared_data": 0.0}
Thanks, I will check.
Hi Jakub, when I look in the test pipelines, both files are created:
-rw-rw-r-- 1 andreas usersfgu 21776276 Sep 7 21:33 Brain-F1-R1.fastq.gz
-rw-rw-r-- 1 andreas usersfgu 0 Sep 7 21:33 Brain-F1-R1.subset
-rw-rw-r-- 1 andreas usersfgu 21846402 Sep 7 21:33 Brain-F1-R2.fastq.gz
-rw-rw-r-- 1 andreas usersfgu 0 Sep 7 21:33 Brain-F1-R2.subset
-rw-rw-r-- 1 andreas usersfgu 13760136 Sep 7 21:33 Brain-F2-R1.fastq.gz
-rw-rw-r-- 1 andreas usersfgu 0 Sep 7 21:33 Brain-F2-R1.subset
-rw-rw-r-- 1 andreas usersfgu 17543563 Sep 7 21:33 Brain-F2-R2.fastq.gz
-rw-rw-r-- 1 andreas usersfgu 0 Sep 7 21:33 Brain-F2-R2.subset
which directory/project are you working in?
Never mind, I see it in the logs. Let me try.
I can reproduce it, thanks!
The reason is likely the interrupted pipe to gzip. This causes the whole statement to fail. It passes in the tests, because the number of reads is below the subsetting threshold.
... though an error code is not generated.
and ignore_pipe_errors was set to True.
... actually, it traces to a type in pipeline_rnaseqqc.py:
ignore_pipe_erors = True
@AndreasHeger
Hi Andreas,
This but was discovered in rnaseqqc but may have wider implications. When presented with two fastq files, the function subsetSequenceData generates only one subset: File1.fastq.1.gz File1.subset etc.
The command that is sent to the cluster is correct. It can be executed on the command line, where it runs fine. We have tested this extensively with Sebastian on both mine and his set-ups and have not found the cause.
The code after the second semi-colon does not appear to be run. There are no error messages.
Best wishes, Jakub