chan-y-park / loom

Python program to generate, draw, and analyze spectral networks of class S theories
10 stars 3 forks source link

Strange failure when creating a large number of networks #28

Closed neitzke closed 9 years ago

neitzke commented 9 years ago

I have been trying to make movies with loom, of 100 frames, and ran into a difficulty: loom.api.generate_spectral_network mysteriously quits, returning no output, after almost all of the frames have been made.

More precisely: if I open an ipython session and run

import loom
config = loom.api.load_config("triangle-4.ini")
sn = loom.api.generate_spectral_network(config)

where triangle-4.ini contains the contents

[Seiberg-Witten data]
casimir_differentials = {2: -10*z, 3: 4, 4: 9*z^2}
root_system = A3
representation = 1
differential_parameters = {energy: 5}
ramification_point_finding_method = discriminant 
#ramification_point_finding_method = system_of_eqs 

[numerical parameters]
#default range as [[z.real.min, z.real.max], [z.imag.min, z.imag.max]]
plot_range = [[-5, 5], [-5, 5]]
num_of_steps = 5000
num_of_iterations = 5
size_of_small_step = 0.001
size_of_large_step = 0.02
size_of_neighborhood = 0.01
size_of_puncture_cutoff = 0.002
size_of_ramification_pt_cutoff = 0.001
size_of_bin = 0.06
accuracy = 1e-06
n_processes = 0

mass_limit = 50.0
phase_range = [0.00001, 3.14159, 100]

the code runs for a long time, generating a lot of console output, and finally ends up with

19352: Finished generating spectral network #94/100.
19369: Using CGAL to find intersections.
19369: CGAL not available; switch from get_new_joints_using_cgal() to get_new_joints_using_interpolation().
19369: Growing S-wall #17...
19374: Using CGAL to find intersections.
19374: CGAL not available; switch from get_new_joints_using_cgal() to get_new_joints_using_interpolation().
19374: Growing S-wall #16...
19369: Using CGAL to find intersections.
19369: CGAL not available; switch from get_new_joints_using_cgal() to get_new_joints_using_interpolation().
19369: No additional joint found: Stop growing this spectral network at iteration #1.
19369: Finished generating spectral network #95/100.
19374: Using CGAL to find intersections.
19374: CGAL not available; switch from get_new_joints_using_cgal() to get_new_joints_using_interpolation().
19374: Growing S-wall #17...
19374: Using CGAL to find intersections.
19374: CGAL not available; switch from get_new_joints_using_cgal() to get_new_joints_using_interpolation().
19374: No additional joint found: Stop growing this spectral network at iteration #1.
19374: Finished generating spectral network #96/100.

At this point I am actually back to the console -- if I hit enter, I get the ipython prompt. The variable "sn" which was supposed to contain the generated spectral networks is empty.

I think the trouble might be that one of the computations somewhere in the middle (say number 45) is failing with an exception, but somehow the rest of the processes keep running anyway, so the error output (stack trace) gets buried under all the other console output...?

plonghi commented 9 years ago

Hi Andy, have you tried searching the output for keywords such as 'Exception' (which would usually cause a break of the run) or simply 'Error'? I am planning to check this soon, but can't right now.

neitzke commented 9 years ago

Hi Andy, have you tried searching the output for keywords such as 'Exception' (which would usually cause a break of the run) or simply 'Error'?

There is nothing like that in the logs; but maybe the logger doesn't dump that kind of output; I will try to figure out how to tell ipython to redirect the console output to a file, so that I can search it.

chan-y-park commented 9 years ago

Hi Andy,

When generating multiple networks, children processes do the actual jobs and sometimes an error that occurs in one of the children does not propagate well up to the parent, probably thanks to my sloppy implementation of multiprocessing. Anyway, I would recommend running a single-phase job for each phase that did not return, this case there will be four phases that had problems.

Probably I need to come up with a more graceful treatment of such failures...

neitzke commented 9 years ago

Running my test script with all output redirected, I see a (perhaps) more informative message:

40698: CGAL not available; switch from get_new_joints_using_cgal() to get_new_joints_using_interpolation().
40694: Using CGAL to find intersections.
40694: CGAL not available; switch from get_new_joints_using_cgal() to get_new_joints_using_interpolation().
40698: Growing S-wall #20...
40689: No additional joint found: Stop growing this spectral network at iteration #3.
40689: Finished generating spectral network #15/100.
Traceback (most recent call last):
  File "testscript.py", line 3, in <module>
    sn = loom.api.generate_spectral_network(config)
  File "/home/andy/loom/loom/api.py", line 87, in generate_spectral_network
    config,
  File "/home/andy/loom/loom/parallel.py", line 99, in parallel_get_spectral_network
    spectral_network_list.append(result.get())
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
ValueError: A value in x_new is below the interpolation range.
40689: Start generating spectral network #24/100: theta = 0.729872020202.
40689: Start growing a new spectral network...
chan-y-park commented 9 years ago

I think the intersection-finding part using the interpolation method failed. Is it difficult to install CGAL on the machine? That may resolve the issue.

neitzke commented 9 years ago

With CGAL it "almost" works: 99 out of 100 networks seem to go OK, but of course one failure is enough to spoil the whole thing. Buried deep in the logs I find the following:

13028: Growing S-wall #8...
13023: Growing S-wall #9...
13048: Using CGAL to find intersections.

13040: Using CGAL to find intersections.

terminate called after throwing an instance of 'CGAL::Precondition_exception'
  what():  CGAL ERROR: precondition violation!
Expr: comp_f(object, nodeP->object) != LARGER
File: /usr/local/include/CGAL/Multiset.h
Line: 2141
13180: Start generating spectral network #24/100: theta = 0.722573987093.
13180: Start growing a new spectral network...
13180: Seed S-walls at branch points...
13032: Using CGAL to find intersections.

I guess this is some internal error generated by CGAL itself.

plonghi commented 9 years ago

It could be, but isn't it strange that CGAL gives an error exactly at the same point as the other method? Perhaps there is something sick in the evolution of the network with thesesconfiguration parameteres, I wonder if decreasing the mass limit or varying a bit the moduli will help.

plonghi commented 9 years ago

Running with a mass cutoff of 5, I don't see errors in the output, but still it gets to 99 and crashes. I noticed a message "number of intersections larger than the buffer size" It could make sense if two streets of compatible root-types (ie they can form joints) are parallel or antiparallel. That should not happen, but it could be due to a wrong assignment of roots, by the trivialization module. It would be really helpful to figure out the phase at which any of these errors happen, possibly by following backwards the process id in the output

chan-y-park commented 9 years ago

The message number of intersections larger than the buffer size is totally fine, buffer size is readjusted automatically. But that means it detected more than 10 intersections (that's the default buffer size) between two S-walls, which sounds fishy. I agree with Pietro and it would be much more helpful if we know the problematic phases.

plonghi commented 9 years ago

Hi Chan, I focused on one such phase: it's 1.49146717172, it runs fine until the end, giving the messages about the large number of intersections. Maybe this is not fishy actually, given the picture below. In any case, this is not the one phase that is causing the breakdown.

schermata 2015-10-10 alle 01 18 32

chan-y-park commented 9 years ago

Thanks, Pietro, I will try to run it. But it's not clear to me why this kind of configuration will lead to so many intersections between TWO S-walls. The message is about intersections between one S-wall and another, not about the total number of intersections. But maybe it's just my lack of imagination. We'll know better when we have some data at the phase.

neitzke commented 9 years ago

The phase

theta = 0.722567310326

seems to be one where things fail, though with a different error than the one I had above:

61507: Growing S-wall #6...
61507: Using CGAL to find intersections.

terminate called after throwing an instance of 'CGAL::Assertion_exception'
  what():  CGAL ERROR: assertion violation!
Expr: (m_statusLine.size() == 0)
File: /usr/local/include/CGAL/Sweep_line_2/Basic_sweep_line_2_impl.h
Line: 229
plonghi commented 9 years ago

Hi Andy

I just finished working on improving the trivialization and several other inner workings of loom. Running the 100-phases scan of your file, (can't recall if I modified it, but I'm adding it to the config folder) seems to work fine now, it produces 100 beautiful pictures for me.

Oops, I previously attached all of them: but here is just the one that was giving trouble

triangle_4_23

chan-y-park commented 9 years ago

What a great job, thanks Pietro! By the way, do you know what was causing the problem? It seems that there should have been no dramatic failure in finding intersections of the above S-walls.

plonghi commented 9 years ago

I think you are right, the intersection algorithm was probably a red herring.. I'm not exactly sure where the problem was in this case, because I was working on general features -- but there were a few serious bugs here and there.

A quick summary of major changes is:

Please double-check at your earliest convenience, if it works for you as well, we can finally kill this issue ;)

neitzke commented 9 years ago

I tried running the 100 networks and it indeed ran to completion this time -- fantastic! But then when I tried to save the generated networks it failed with the "not JSON serializable" problem -- could it be that the fix for this got accidentally wiped out during the merge?

neitzke commented 9 years ago

I guess that's a separate issue, anyway, so I can close this one. Thanks a lot!

chan-y-park commented 9 years ago

Maybe something that cannot be JSONified is added. Now I am trying to merge branches so I hope I can pin down where it went wrong.

plonghi commented 9 years ago

That's quite possible, I completely forgot about that (I am using pickle as a matter of fact..I promise to switch soon). It's also possible that in the drastic manual merging I had to do I erased something that Chan had fixed..apologies in advance if that is the case