Open wsuplantpathology opened 7 years ago
Hi @wsuplantpathology
Can you post the contents of:
/data/chen/PacBio/PST/assembly/falcon/raw_all/3-unzip/reads/task.json
It looks like the task finished successfully.
Does 3-unzip/reads/track_reads_done
exist, and is 3-unzip/reads/ctg_list
non-empty?
If so, you may be able to proceed by touching the sentinels and re-invoking the pipeline:
touch /data/chen/PacBio/PST/assembly/falcon/raw_all/3-unzip/reads/track_reads_done.exit
touch /data/chen/PacBio/PST/assembly/falcon/raw_all/3-unzip/reads/run.sh.done
Thanks @gconcepcion
1) Here is the contents of /data/chen/PacBio/PST/assembly/falcon/raw_all/3-unzip/reads/task.json
{
"inputs": {
"falcon_asm_done": "/data/chen/PacBio/PST/assembly/falcon/raw_all/2-asm-falcon/falcon_asm_done"
},
"outputs": {
"ctg_list_file": "ctg_list",
"job_done": "track_reads_done"
},
"parameters": {
"config": {
"job_queue": "default",
"job_type": "local",
"pwatcher_type": "fs_based",
"sge_blasr_aln": "-pe smp 24 -q your_sge_queue",
"sge_hasm": "-pe smp 48 -q your_sge_queue",
"sge_phasing": "-pe smp 12 -q your_sge_queue",
"sge_track_reads": "-pe smp 12 -q your_sge_queue",
"smrt_bin": "/data/chen/software/GenomicConsensus-smrtanalysis-3.0.2/bin",
"unzip_blasr_concurrent_jobs": 8,
"unzip_phasing_concurrent_jobs": 8
},
"sge_option": "-pe smp 12 -q your_sge_queue",
"wd": "/data/chen/PacBio/PST/assembly/falcon/raw_all/3-unzip/reads"
},
"python_function": "falcon_unzip.unzip.task_track_reads"
}
2) I do have non-empty 3-unzip/reads/ctg_list
, but 3-unzip/reads/track_reads_done
does not exist. However, I have 3-unzip/reads/track_reads_done.exit
and this file is empty.
3) I ran your last two commands, nothing happens but two empty files generated, run.sh.done
and track_reads_done.exit
Besides, in the 3-unzip/reads/ directory there are many *F_reads.fa
and *F_ref.fa
files, and they are non-empty. however, I notice the first file 000000F_reads.fa
is 42Mb, then size of following *F_reads.fa
is decreasing, with last one 000544F._reads.fa
has only 354kb. Is this normal? The name of *F
are all listed in ctg_list
. My new questions are,
1) how do you know the task looks like finished successfully?
2) what is that critical error in CRITICAL - Error in /home/chongjing.xia/FALCON-integrate/pypeFLOW/pypeflow/do_task.py with args="{'json_fn': '/data/chen/PacBio/PST/assembly/falcon/raw_all/3-unzip/reads/task.json',\n 'timeout': 600000000000000,\n 'tmpdir': None}"
Thanks very much for your time and efforts, I appreciate.
Is your SGE queue really called "your_sge_queue", or did you change it just to paste it here? I would say look into this as the obvious culprit. Not sure if this is the case - just a suggestion to investigate.
So the run.sh script exited due to an import failure prior to touching the track_reads_done
file. That's interesting because python -m falcon_kit.mains.rr_ctg_track
seems to have completely successfully, but then the script chokes on: python -m falcon_kit.mains.pr_ctg_track
with an python env error. But both of those scripts import the same python modules so i'm not sure what's happening.
Besides, in the 3-unzip/reads/ directory there are many F_reads.fa and F_ref.fa files, and they are non-empty. however, I notice the first file 000000F_reads.fa is 42Mb, then size of following F_reads.fa is decreasing, with last one 000544F._reads.fa has only 354kb. Is this normal? The name of F are all listed in ctg_list. My new questions are,
Yes, this is normal. In general the lowest number contig 000000F
is the longest and thus it has the most reads mapping to it. The number of reads mapping to a particular contig should decrease as the contig number increases. Unless of course the contig is a highly repetitive section of DNA
how do you know the task looks like finished successfully?
Sorry, I was mistaken - I missed the env error /usr/bin/python: No module named ext_falcon
I'll need to take a look and see if I can figure out what's going on here.
what is that critical error in CRITICAL - Error in /home/chongjing.xia/FALCON-integrate/pypeFLOW/pypeflow/do_task.py with args="{'json_fn': '/data/chen/PacBio/PST/assembly/falcon/raw_all/3-unzip/reads/task.json',\n 'timeout': 600000000000000,\n 'tmpdir': None}"
That's just saying that the read tracking task exited without completing successfully, due to the issues mentioned above.
Thanks @gconcepcion for your explanations, very helpful.
Is your SGE queue really called "your_sge_queue", or did you change it just to paste it here? I would say look into this as the obvious culprit. Not sure if this is the case - just a suggestion to investigate.
No, I just leave these setting as default. I set job_type = local
. So I didn't change SGE settings in the example .cfg. Actually, we are using SLURM, our partition is called "kamiak". Do you think I should set job_queue = kamiak
, even though I have job_type = local
.
Some update information:
Since you mentioned the track reads step looks successful, so I proceeded the following steps. And it turns out, everything looks good: I got phasing step done, and even quiver consensus. Basically, I have all_p_ctg.fa
(85Mb, which is a little bit smaller than previously estimated 100Mb), all_h_ctg.fa
(54Mb, I think this size is good since my fungus has high heterozygosity). But these are my first impression, I'll look into the details of the running process. Any comments are welcome on these results, since I am not so confident on how to interpret these data.
Thanks very much.
Hi @pb-jchin @pb-cdunn ,
Please help me with this issue, I was running falcon_unzip 0.4.0, and was successful using Ecoli example data after falcon running. However, it was failed when I ran my own data, even though I was using exactly the same environment, the same dependencies, and the same falcon-kit. Here is fc_unzip.cfg fc_unzip.txt Here is what I got from my own data running in 3-unzip/reads/pwatcher.dir/stderr :
For comparison, here is what I got from successful ecoli example, from 3-unzip/reads/pwatcher.dir/stderr: