HolyLab / BlockRegistrationScheduler.jl

Multi-core image registration scheduler
1 stars 0 forks source link

Test error #5

Closed Cody-G closed 8 years ago

Cody-G commented 8 years ago

I get the following error when testing BlockRegistrationScheduler. More specifically, this happens when compiling RegisterDriver:

INFO: Recompiling stale cache file /home/cody/git/juliapackages/lib/v0.4/RegisterDriver.ji for module RegisterDriver.
ERROR: LoadError: ArgumentError: RegisterCore not found in path
 in compilecache at loading.jl:376
 in require at ./loading.jl:224
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:304
 [inlined code] from none:2
 in anonymous at no file:0
 in process_options at ./client.jl:284
 in _start at ./client.jl:411
while loading /home/cody/git/juliapackages/v0.4/BlockRegistrationScheduler/src/RegisterWorkerApertures.jl, in expression starting on line 6
ERROR: LoadError: LoadError: LoadError: Failed to precompile RegisterWorkerApertures to /home/cody/git/juliapackages/lib/v0.4/RegisterWorkerApertures.ji
 in error at ./error.jl:21
 in compilecache at loading.jl:383
 in require at ./loading.jl:250
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:304
 in require at ./loading.jl:237
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:304
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:304
 in process_options at ./client.jl:308
 in _start at ./client.jl:411
while loading /home/cody/git/juliapackages/v0.4/BlockRegistrationScheduler/src/BlockRegistrationScheduler.jl, in expression starting on line 14
while loading /home/cody/git/juliapackages/v0.4/BlockRegistrationScheduler/test/driver.jl, in expression starting on line 5
while loading /home/cody/git/juliapackages/v0.4/BlockRegistrationScheduler/test/runtests.jl, in expression starting on line 1

I thought that just adding a "using BlockRegistration" in RegisterWorkerApertures.jl would fix this by making RegisterCore available on the load path, but when I did that I got this error:

INFO: Testing BlockRegistrationScheduler
WARNING: Base.String is deprecated, use AbstractString instead.
  likely near /home/cody/.juliarc.jl:2
WARNING: Base.String is deprecated, use AbstractString instead.
  likely near /home/cody/.juliarc.jl:2
WARNING: Base.String is deprecated, use AbstractString instead.
  likely near /home/cody/.juliarc.jl:2
ERROR: LoadError: __precompile__(false) is not allowed in files that are being precompiled
 in __precompile__ at loading.jl:161
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:304
 [inlined code] from none:2
 in anonymous at no file:0
 in process_options at ./client.jl:284
 in _start at ./client.jl:411
while loading /home/cody/git/juliapackages/v0.4/BlockRegistration/src/BlockRegistration.jl, in expression starting on line 1

I don't think I understand precompilation well enough to diagnose this.

timholy commented 8 years ago

Already addressed in the shiny new README for BlockRegistration.

timholy commented 8 years ago

(I'll let you close this, once you're satisfied the issue is solved.)

timholy commented 8 years ago

Oh, reading more carefully: make sure your user code says using BlockRegistration before saying using BlockRegistrationScheduler. Also, make sure you start any worker processes before issuing any using statements; see julia issue#3674.

(I'm not putting in explicit hyperlinks anymore because it leaves a breadcrumb on the target site with a big "lock" on the link, seems like too much advertising about our private repos.)

timholy commented 8 years ago

Also, please use CUDArt's new wait_free in your scripts now:

wait_free(devlist)
@time driver(string(basename, ".reg"), algorithm, img, mon)
Cody-G commented 8 years ago

I will check the README about how to fix this. But note that the error comes when running the test scripts for BlockRegistrationScheduler, not in my own script. So we may need to change the test script. I'll look into it now.

timholy commented 8 years ago

Guessing here, but you can't run runtests.jl until julia #13455 gets backported to 0.4. You have to run the tests individually. This is part of why all Travis tests of BlockRegistrationScheduler are failing (MAT.jl bugs, which I also fixed, are another).

Cody-G commented 8 years ago

Ah I see. So for now the tests should be run by first executing

using BlockRegistration, BlockRegistrationScheduler

and then including the test scripts individually. When I do that, all of the test scripts finish fine, except for "rigid.jl". The optimization fails to converge after 2000 iterations (at that point I killed the program). For almost all of those iterations the objective function value seems to be stuck at 2.0812524e-01

timholy commented 8 years ago

Is that failure reproducible? For me it works, so if it is reproducible for you we probably have a package version issue that I haven't considered. (If it's not reproducible, then it suggests some test depends on random numbers and has too-stringent convergence criteria.)

timholy commented 8 years ago

Oh, wait; I'm on a slightly different branch; will push. That will fix the need for your using statement.

Cody-G commented 8 years ago

Yes it is reproducible. The objective does switch to other values once and a while, but then it goes back to the same value. The switching is strange, right?

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  80  2.0812524e-01 0.00e+00 6.81e-03 -11.0 6.81e-03    -  1.00e+00 3.55e-15f 49
  81  2.0812524e-01 0.00e+00 6.81e-03 -11.0 6.81e-03    -  1.00e+00 3.55e-15f 49
  82  2.0812524e-01 0.00e+00 6.81e-03 -11.0 6.81e-03    -  1.00e+00 3.55e-15f 49
  83  2.0812524e-01 0.00e+00 6.81e-03 -11.0 6.81e-03    -  1.00e+00 3.55e-15f 49
  84  2.0812524e-01 0.00e+00 6.81e-03 -11.0 6.81e-03    -  1.00e+00 3.55e-15f 49
  85  2.0846851e-01 0.00e+00 7.62e-03 -11.0 6.81e-03    -  1.00e+00 1.00e+00w  1
  86  2.0887478e-01 0.00e+00 8.47e-04 -11.0 7.62e-03    -  1.00e+00 1.00e+00w  1
  87  2.0883239e-01 0.00e+00 7.18e-04 -11.0 7.60e-04    -  1.00e+00 1.00e+00w  1
  88  2.0812524e-01 0.00e+00 6.81e-03 -11.0 3.39e-04    -  1.00e+00 3.55e-15f 48
  89  2.0812524e-01 0.00e+00 6.81e-03 -11.0 1.53e-02    -  1.00e+00 1.78e-15f 50
Cody-G commented 8 years ago

I'll try cloning and building everything from scratch in a new package directory. Maybe some of my package versions aren't correct since I didn't start from scratch before?

Cody-G commented 8 years ago

I got an unrelated error when starting from scratch, see BlockRegistration #12

timholy commented 8 years ago

I just built a version of julia on cannon that has julia#13561 and julia#13455. Hopefully soon those will make it on to release-0.4, but for now this should ease many things.

(Apologies in advance for forcing a recompile of all those packages!)

timholy commented 8 years ago

OK, this was really subtle: depending on which tests were being run in which order, sometimes this test was taking a transpose of the moving image before trying to perform registration. Since the image was square, this did not manifest as a size mismatch. So rather than trying to align with a pi/12 angular difference, it was trying to align with a 7pi/12 angular difference.

Root cause fixed in Interpolations#82. It seems as if there's a secondary failure in the logic behind https://github.com/timholy/AffineTransforms.jl/blob/eeee407b3df35bfc94caab123a60af71ec414993/src/tformedarrays.jl#L97-L100, perhaps interacting with julia#265. I haven't looked into this yet (I have bigger fish to fry).

Upgrading to master on Interpolations, once that PR merges, should fix this. Now you can test with both runtests.jl and rigid.jl. You might want #7, however, which incidentally fixes one more path bug.