PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 102 forks source link

error: raise PypeError, "Add different objects with the same URL %s" % obj.URL #214

Closed yingzhang121 closed 8 years ago

yingzhang121 commented 9 years ago

Hi, Any idea about the object URL error?

Best, Ying

pb-jchin commented 9 years ago

What is your execution envrionment?

yingzhang121 commented 9 years ago

I requested one entire node with 24 cores from our grid (not SGE). And this is my cfg file:

[General]
input_fofn = 70x.input.fofn
input_type = raw

# The length cutoff used for seed reads used for initial mapping
length_cutoff = 3000

# The length cutoff used for seed reads usef for pre-assembly
length_cutoff_pr = 10000

job_type = local
jobqueue =
sge_option_da =
sge_option_la =
sge_option_pda =
sge_option_pla =
sge_option_fc =
sge_option_cns =

pa_concurrent_jobs = 4
cns_concurrent_jobs = 4
ovlp_concurrent_jobs = 4

pa_HPCdaligner_option =  -v -dal4 -t6 -e.70 -l1000 -s1000
ovlp_HPCdaligner_option = -v -dal4 -t6 -h60 -e.96 -l500 -s1000

pa_DBsplit_option = -x500 -s50
ovlp_DBsplit_option = -x500 -s50

falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 4 --local_match_count_threshold 2 --max_n_read 200 --n_core 6 --output_dformat

overlap_filtering_setting = --max_diff 100 --max_cov 100 --min_cov 20 --bestn 10 --n_core 24
yingzhang121 commented 9 years ago

And this is the first 10 lines of my 70x.input.fofn

$ head 70x.input.fofn data/m130831_100442_00125_c100565612550000001823094512221364_s1_p0.1.subreads.fasta data/m130831_100442_00125_c100565612550000001823094512221364_s1_p0.2.subreads.fasta data/m130831_100442_00125_c100565612550000001823094512221364_s1_p0.3.subreads.fasta data/m130906_004047_00125_c100563852550000001823088712221310_s1_p0.1.subreads.fasta data/m130906_004047_00125_c100563852550000001823088712221310_s1_p0.2.subreads.fasta data/m130906_004047_00125_c100563852550000001823088712221310_s1_p0.3.subreads.fasta data/m130917_124511_00125_c100564032550000001823088712221365_s1_p0.1.subreads.fasta data/m130917_124511_00125_c100564032550000001823088712221365_s1_p0.2.subreads.fasta data/m130917_124511_00125_c100564032550000001823088712221365_s1_p0.3.subreads.fasta data/m130917_150710_00125_c100564032550000001823088712221366_s1_p0.1.subreads.fasta

yingzhang121 commented 9 years ago

The full error message is:

Traceback (most recent call last):
  File "/soft/pacificbiosciences-falcon/FALCON-integrate/fc_env/bin/fc_run.py", line 4, in <module>
    __import__('pkg_resources').run_script('falcon-kit==0.3.0', 'fc_run.py')
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 517, in run_script
    """Return a string containing the contents of `resource_name`
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1436, in run_script
    def _markerlib_evaluate(cls, text):
  File "/panfs/roc/itascasoft/pacificbiosciences-falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.3.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/fc_run.py", line 5, in <module>
    main(*sys.argv)
  File "/panfs/roc/itascasoft/pacificbiosciences-falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.3.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 869, in main
    main1(*argv)
  File "/panfs/roc/itascasoft/pacificbiosciences-falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.3.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 704, in main1
    wf.addTasks(daligner_tasks)
  File "/panfs/roc/itascasoft/pacificbiosciences-falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 523, in addTasks
    PypeWorkflow.addTasks(self, taskObjs)
  File "/panfs/roc/itascasoft/pacificbiosciences-falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 257, in addTasks
    self.addObjects([dObj])
  File "/panfs/roc/itascasoft/pacificbiosciences-falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 228, in addObjects
    raise PypeError, "Add different objects with the same URL %s" % obj.URL
pypeflow.common.PypeError: 'Add different objects with the same URL file://localhost/panfs/roc/scratch/yzhang/medicago_test/0-rawreads/job_fdb0464e/job_fdb0464e_done'
pb-jchin commented 9 years ago

I am not sure if this is an rare case of hash collision. if it is, this should be addressed by https://github.com/PacificBiosciences/FALCON/issues/205. If you can update to the latest master head, you might be able go over this. (If you are not a developer, you may need some help from some one who is good at hacking python code on your side.)

yingzhang121 commented 9 years ago

Thanks, I will look into the #205 to see whether I had the same case. But I must say in total I have more than 200 fasta files under the data directory. And this is the first time I got such an error. So it seems like a hash crash.

pb-cdunn commented 8 years ago

Please re-open if the latest code does not solve the problem.