bcgsc / mavis

Merging, Annotation, Validation, and Illustration of Structural variants
http://mavis.bcgsc.ca
GNU General Public License v3.0
72 stars 14 forks source link

Stuck writing log #154

Closed dmacmillan closed 5 years ago

dmacmillan commented 6 years ago

MAVIS version: 2.1.1

Python version: 3.6.2

OS: CentOS 6.7

Expected Behaviour

Run Mavis

Actual Behaviour

Does not run Mavis

Steps to Reproduce the Behaviour

mavis gets stuck writing to some log file:

[2018-10-08 09:12:18] writing: /projects/wtss_scratch/mavis/POG/gd-A10978_A10979_gn-A10980_rd-A10982/mavis_v2.1.1/1389/A10980_normal_genome/cluster/MC_A10980_batch-q3zzV69ntwX6zvXEJgNp7j.log
^CTraceback (most recent call last):
  File "/gsc/pipelines/mavis/v2.1.1/venv/bin/mavis", line 11, in <module>
    load_entry_point('mavis==2.1.1', 'console_scripts', 'mavis')()
  File "/gsc/pipelines/mavis/v2.1.1/venv/lib/python3.6/site-packages/mavis-2.1.1-py3.6.egg/mavis/main.py", line 397, in main
    pipeline = _pipeline.Pipeline.build(config)
  File "/gsc/pipelines/mavis/v2.1.1/venv/lib/python3.6/site-packages/mavis-2.1.1-py3.6.egg/mavis/schedule/pipeline.py", line 323, in build
    clustered_files = _main(cls.format_args(SUBCOMMAND.CLUSTER, args))
  File "/gsc/pipelines/mavis/v2.1.1/venv/lib/python3.6/site-packages/mavis-2.1.1-py3.6.egg/mavis/main.py", line 369, in main
    ret_val = cluster_main.main(**args, start_time=start_time)
  File "/gsc/pipelines/mavis/v2.1.1/venv/lib/python3.6/site-packages/mavis-2.1.1-py3.6.egg/mavis/cluster/main.py", line 155, in main
    breakpoint_pairs, cluster_radius=cluster_radius, cluster_initial_size_limit=cluster_initial_size_limit)
  File "/gsc/pipelines/mavis/v2.1.1/venv/lib/python3.6/site-packages/mavis-2.1.1-py3.6.egg/mavis/cluster/cluster.py", line 244, in merge_breakpoint_pairs
    distances = sorted([(pair_center_distance(pair, node), node) for node in nodes], key=lambda x: x[0])
  File "/gsc/pipelines/mavis/v2.1.1/venv/lib/python3.6/site-packages/mavis-2.1.1-py3.6.egg/mavis/cluster/cluster.py", line 244, in <listcomp>
    distances = sorted([(pair_center_distance(pair, node), node) for node in nodes], key=lambda x: x[0])
  File "/gsc/pipelines/mavis/v2.1.1/venv/lib/python3.6/site-packages/mavis-2.1.1-py3.6.egg/mavis/cluster/cluster.py", line 200, in pair_center_distance
    d = abs(pair1.break1.center - pair2.break1.center)
  File "/gsc/pipelines/mavis/v2.1.1/venv/lib/python3.6/site-packages/mavis-2.1.1-py3.6.egg/mavis/interval.py", line 174, in center
    return (self[1] + self[0]) / 2
  File "/gsc/pipelines/mavis/v2.1.1/venv/lib/python3.6/site-packages/mavis-2.1.1-py3.6.egg/mavis/interval.py", line 84, in __getitem__
    index = int(index)
KeyboardInterrupt
^C

It ran for 4 days before I decided to end it's life. I think it may have to do with the list comprehension:

distances = sorted([(pair_center_distance(pair, node), node) for node in nodes], key=lambda x: x[0])

but that is just my guess. Apologies for the minimal information to go off of, this is all I've got!

calchoo commented 6 years ago

Looking at the config file, it looks like your DELLY input file has over 1,000,000 SVs which is taking a very long time to cluster and will take even longer to actually process. I think you'll need to either take out the DELLY result or rerun try rerunning DELLY with a more updated version and see if we get a more reasonable number of events.