Closed neitzke closed 9 years ago
Hi Andy, have you tried searching the output for keywords such as 'Exception' (which would usually cause a break of the run) or simply 'Error'? I am planning to check this soon, but can't right now.
Hi Andy, have you tried searching the output for keywords such as 'Exception' (which would usually cause a break of the run) or simply 'Error'?
There is nothing like that in the logs; but maybe the logger doesn't dump that kind of output; I will try to figure out how to tell ipython to redirect the console output to a file, so that I can search it.
Hi Andy,
When generating multiple networks, children processes do the actual jobs and sometimes an error that occurs in one of the children does not propagate well up to the parent, probably thanks to my sloppy implementation of multiprocessing
. Anyway, I would recommend running a single-phase job for each phase that did not return, this case there will be four phases that had problems.
Probably I need to come up with a more graceful treatment of such failures...
Running my test script with all output redirected, I see a (perhaps) more informative message:
40698: CGAL not available; switch from get_new_joints_using_cgal() to get_new_joints_using_interpolation().
40694: Using CGAL to find intersections.
40694: CGAL not available; switch from get_new_joints_using_cgal() to get_new_joints_using_interpolation().
40698: Growing S-wall #20...
40689: No additional joint found: Stop growing this spectral network at iteration #3.
40689: Finished generating spectral network #15/100.
Traceback (most recent call last):
File "testscript.py", line 3, in <module>
sn = loom.api.generate_spectral_network(config)
File "/home/andy/loom/loom/api.py", line 87, in generate_spectral_network
config,
File "/home/andy/loom/loom/parallel.py", line 99, in parallel_get_spectral_network
spectral_network_list.append(result.get())
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
ValueError: A value in x_new is below the interpolation range.
40689: Start generating spectral network #24/100: theta = 0.729872020202.
40689: Start growing a new spectral network...
I think the intersection-finding part using the interpolation method failed. Is it difficult to install CGAL on the machine? That may resolve the issue.
With CGAL it "almost" works: 99 out of 100 networks seem to go OK, but of course one failure is enough to spoil the whole thing. Buried deep in the logs I find the following:
13028: Growing S-wall #8...
13023: Growing S-wall #9...
13048: Using CGAL to find intersections.
13040: Using CGAL to find intersections.
terminate called after throwing an instance of 'CGAL::Precondition_exception'
what(): CGAL ERROR: precondition violation!
Expr: comp_f(object, nodeP->object) != LARGER
File: /usr/local/include/CGAL/Multiset.h
Line: 2141
13180: Start generating spectral network #24/100: theta = 0.722573987093.
13180: Start growing a new spectral network...
13180: Seed S-walls at branch points...
13032: Using CGAL to find intersections.
I guess this is some internal error generated by CGAL itself.
It could be, but isn't it strange that CGAL gives an error exactly at the same point as the other method? Perhaps there is something sick in the evolution of the network with thesesconfiguration parameteres, I wonder if decreasing the mass limit or varying a bit the moduli will help.
Running with a mass cutoff of 5, I don't see errors in the output, but still it gets to 99 and crashes. I noticed a message "number of intersections larger than the buffer size" It could make sense if two streets of compatible root-types (ie they can form joints) are parallel or antiparallel. That should not happen, but it could be due to a wrong assignment of roots, by the trivialization module. It would be really helpful to figure out the phase at which any of these errors happen, possibly by following backwards the process id in the output
The message number of intersections larger than the buffer size
is totally fine, buffer size is readjusted automatically. But that means it detected more than 10 intersections (that's the default buffer size) between two S-walls, which sounds fishy. I agree with Pietro and it would be much more helpful if we know the problematic phases.
Hi Chan, I focused on one such phase: it's 1.49146717172, it runs fine until the end, giving the messages about the large number of intersections. Maybe this is not fishy actually, given the picture below. In any case, this is not the one phase that is causing the breakdown.
Thanks, Pietro, I will try to run it. But it's not clear to me why this kind of configuration will lead to so many intersections between TWO S-walls. The message is about intersections between one S-wall and another, not about the total number of intersections. But maybe it's just my lack of imagination. We'll know better when we have some data at the phase.
The phase
theta = 0.722567310326
seems to be one where things fail, though with a different error than the one I had above:
61507: Growing S-wall #6...
61507: Using CGAL to find intersections.
terminate called after throwing an instance of 'CGAL::Assertion_exception'
what(): CGAL ERROR: assertion violation!
Expr: (m_statusLine.size() == 0)
File: /usr/local/include/CGAL/Sweep_line_2/Basic_sweep_line_2_impl.h
Line: 229
Hi Andy
I just finished working on improving the trivialization and several other inner workings of loom. Running the 100-phases scan of your file, (can't recall if I modified it, but I'm adding it to the config folder) seems to work fine now, it produces 100 beautiful pictures for me.
Oops, I previously attached all of them: but here is just the one that was giving trouble
What a great job, thanks Pietro! By the way, do you know what was causing the problem? It seems that there should have been no dramatic failure in finding intersections of the above S-walls.
I think you are right, the intersection algorithm was probably a red herring.. I'm not exactly sure where the problem was in this case, because I was working on general features -- but there were a few serious bugs here and there.
A quick summary of major changes is:
Please double-check at your earliest convenience, if it works for you as well, we can finally kill this issue ;)
I tried running the 100 networks and it indeed ran to completion this time -- fantastic! But then when I tried to save the generated networks it failed with the "not JSON serializable" problem -- could it be that the fix for this got accidentally wiped out during the merge?
I guess that's a separate issue, anyway, so I can close this one. Thanks a lot!
Maybe something that cannot be JSONified is added. Now I am trying to merge branches so I hope I can pin down where it went wrong.
That's quite possible, I completely forgot about that (I am using pickle as a matter of fact..I promise to switch soon). It's also possible that in the drastic manual merging I had to do I erased something that Chan had fixed..apologies in advance if that is the case
I have been trying to make movies with loom, of 100 frames, and ran into a difficulty: loom.api.generate_spectral_network mysteriously quits, returning no output, after almost all of the frames have been made.
More precisely: if I open an ipython session and run
where triangle-4.ini contains the contents
the code runs for a long time, generating a lot of console output, and finally ends up with
At this point I am actually back to the console -- if I hit enter, I get the ipython prompt. The variable "sn" which was supposed to contain the generated spectral networks is empty.
I think the trouble might be that one of the computations somewhere in the middle (say number 45) is failing with an exception, but somehow the rest of the processes keep running anyway, so the error output (stack trace) gets buried under all the other console output...?