SyneRBI / SIRF-SuperBuild

SIRF CMake SuperBuild
http://www.ccpsynerbi.ac.uk
Apache License 2.0
15 stars 18 forks source link

CI: test docker CIL #838

Closed casperdcl closed 11 months ago

casperdcl commented 1 year ago

diff: https://github.com/SyneRBI/SIRF-SuperBuild/pull/838/files?w=1

casperdcl commented 1 year ago

@paskino I'm not sure this is actually required, since we now have RUN_CTEST https://github.com/SyneRBI/SIRF-SuperBuild/blob/c7b69730486e2c9e26d3654a5964631df3c5fb3d/docker/user_sirf-ubuntu.sh#L37

KrisThielemans commented 1 year ago

I think it's very good. It picks the version from version_config.cmake and uses that.

but we're getting CIL via conda...

KrisThielemans commented 1 year ago

Error is

++ docker run --rm synerbi/sirf:service python -c 'import cil; print(cil.version.commit_hash[1:])'
error: exec: "python": executable file not found in $PATH
+ DEFAULT_CIL_TAG='Creating sirfuser:1000:1000
Adding user `sirfuser'\'' to group `users'\'' ...
Adding user sirfuser to group users
Done.
Updating file ownership for /home/sirfuser
Switching to sirfuser and executing python -c import cil; print(cil.version.commit_hash[1:])'

So, aside from not finding python, the docker run adds lots of other things to the output. I guess it could be avoided by doing a dummy docker run first, such that sirfuser is already created. Sigh

casperdcl commented 1 year ago

btw how urgent is fixing this?

I was thinking of reworking/simplifing the docker stuff... but I expect it would take some time to get a more robust solution working.

paskino commented 1 year ago

We would like to merge this in and then shortly release SIRF 3.5.0 with relevant tested docker images.

KrisThielemans commented 1 year ago

Or we don't merge it in, and test the docker images manually.

casperdcl commented 1 year ago

first problem: cil.version is borked. Hack for now:

docker run --rm -it synerbi/sirf:service /opt/conda/bin/python
>>> import cil
>>> cil.version.version
'23.0.1'
>>> cil.version.commit_hash  # borked
'-1'
>>> f'v{cil.version.version}' if cil.version.commit_hash == '-1' else cil.version.commit_hash
'v23.0.1'

second problem: sirf isn't found

docker run --rm -v ./cil_sirf_test:/cil_sirf_test synerbi/sirf:service /opt/conda/bin/python -m unittest discover -v /cil_sirf_test
...
----------------------------------------------------------------------
TEST SYSTEM CONFIGURATION
CIL version:  23.0.1
{'has_astra': True,
 'has_ccpi_regularisation': False,
 'has_cvxpy': False,
 'has_ipp': True,
 'has_numba': True,
 'has_nvidia': False,
 'has_tigre': True,
 'has_tomophantom': False}
----------------------------------------------------------------------

test_Gradient (test_SIRF.TestGradientMR_2D) ... skipped 'Skipping as SIRF is not available'
test_TVdenoisingMR (test_SIRF.TestGradientMR_2D) ... skipped 'Has SIRF'
test_Gradient (test_SIRF.TestGradientPET_2D) ... skipped 'Skipping as SIRF is not available'
test_Gradient (test_SIRF.TestGradientPET_3D) ... skipped 'Skipping as SIRF is not available'
test_BlockDataContainer_with_SIRF_DataContainer_add (test_SIRF.TestSIRFCILIntegration) ... skipped 'Has SIRF'
test_BlockDataContainer_with_SIRF_DataContainer_divide (test_SIRF.TestSIRFCILIntegration) ... skipped 'Has SIRF'
test_BlockDataContainer_with_SIRF_DataContainer_multiply (test_SIRF.TestSIRFCILIntegration) ... skipped 'Has SIRF'
test_BlockDataContainer_with_SIRF_DataContainer_subtract (test_SIRF.TestSIRFCILIntegration) ... skipped 'Has SIRF'

----------------------------------------------------------------------
Ran 8 tests in 0.000s

OK (skipped=8)

third problem: PYTHON_INSTALL_DIR should be replaced with /opt/conda but for some reason isn't

docker run --rm synerbi/sirf:service cat .bashrc | grep activate
[ -f PYTHON_INSTALL_DIR/bin/activate ] && . PYTHON_INSTALL_DIR/bin/activate

hack for now:

docker run --rm -v ./cil_sirf_test:/cil_sirf_test synerbi/sirf:service bash --login -c '/opt/conda/bin/python -m unittest discover -v /cil_sirf_test'

fourth problem: gadgetron isn't running

Started reading acquisitions from /opt/SIRF-SuperBuild/INSTALL/share/SIRF-3.5/data/examples/MR/simulated_MR_2D_cartesian.h5
0%..10%..20%..30%..40%..50%..60%..70%..80%..90%..100%..
Finished reading acquisitions from /opt/SIRF-SuperBuild/INSTALL/share/SIRF-3.5/data/examples/MR/simulated_MR_2D_cartesian.h5
connection to gadgetron server failed, trying again...
...
connection to gadgetron server failed
ERROR
...
----------------------------------------------------------------------
Ran 8 tests in 7.271s

FAILED (errors=6)

hack for now:

#file: ./cil_sirf_test/run.sh
this=$(dirname "${BASH_SOURCE[0]}")
$SIRF_PATH/../../INSTALL/bin/gadgetron >& ~/gadgetron.log&
/opt/conda/bin/python -m unittest discover -v $this
for i in $(jobs -p); do kill -n 15 $i; done 2>/dev/null
docker run --rm -v ./cil_sirf_test:/cil_sirf_test synerbi/sirf:service bash --login /cil_sirf_test/run.sh

fifth problem: AttributeError: 'ImageData' object has no attribute 'get_item'

Started reading acquisitions from /opt/SIRF-SuperBuild/INSTALL/share/SIRF-3.5/data/examples/MR/simulated_MR_2D_cartesian.h5
0%..10%..20%..30%..40%..50%..60%..70%..80%..90%..100%..
Finished reading acquisitions from /opt/SIRF-SuperBuild/INSTALL/share/SIRF-3.5/data/examples/MR/simulated_MR_2D_cartesian.h5
Message received with ID: 5
Input stream has terminated
Message received with ID: 5
Input stream has terminated
WARNING:root:C backend is only for arrays of datatype float32 - defaulting to `numpy` backend
ok
test_TVdenoisingMR (test_SIRF.TestGradientMR_2D) ... Started reading acquisitions from /opt/SIRF-SuperBuild/INSTALL/share/SIRF-3.5/data/examples/MR/simulated_MR_2D_cartesian.h5
0%..10%..20%..30%..40%..50%..60%..70%..80%..90%..100%..
Finished reading acquisitions from /opt/SIRF-SuperBuild/INSTALL/share/SIRF-3.5/data/examples/MR/simulated_MR_2D_cartesian.h5
Message received with ID: 5
Input stream has terminated
Message received with ID: 5
Input stream has terminated
WARNING:root:C backend is only for arrays of datatype float32 - defaulting to `numpy` backend
...
WARNING:root:C backend is only for arrays of datatype float32 - defaulting to `numpy` backend
ok
test_Gradient (test_SIRF.TestGradientPET_2D) ... ok
test_Gradient (test_SIRF.TestGradientPET_3D) ... ok
test_BlockDataContainer_with_SIRF_DataContainer_add (test_SIRF.TestSIRFCILIntegration) ... ERROR
test_BlockDataContainer_with_SIRF_DataContainer_divide (test_SIRF.TestSIRFCILIntegration) ... ERROR
test_BlockDataContainer_with_SIRF_DataContainer_multiply (test_SIRF.TestSIRFCILIntegration) ... ERROR
test_BlockDataContainer_with_SIRF_DataContainer_subtract (test_SIRF.TestSIRFCILIntegration) ... ERROR

======================================================================
ERROR: test_BlockDataContainer_with_SIRF_DataContainer_add (test_SIRF.TestSIRFCILIntegration)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/cil_sirf_test/test_SIRF.py", line 343, in test_BlockDataContainer_with_SIRF_DataContainer_add
    self.assertBlockDataContainerEqual(bdc , bdc1)
  File "/cil_sirf_test/testclass.py", line 40, in assertBlockDataContainerEqual
    self.assertBlockDataContainerEqual(container1.get_item(col),container2.get_item(col))
  File "/cil_sirf_test/testclass.py", line 34, in assertBlockDataContainerEqual
    if issubclass(container1.get_item(col).__class__, DataContainer):
AttributeError: 'ImageData' object has no attribute 'get_item'

======================================================================
ERROR: test_BlockDataContainer_with_SIRF_DataContainer_divide (test_SIRF.TestSIRFCILIntegration)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/cil_sirf_test/test_SIRF.py", line 298, in test_BlockDataContainer_with_SIRF_DataContainer_divide
    self.assertBlockDataContainerEqual(bdc , bdc1)
  File "/cil_sirf_test/testclass.py", line 40, in assertBlockDataContainerEqual
    self.assertBlockDataContainerEqual(container1.get_item(col),container2.get_item(col))
  File "/cil_sirf_test/testclass.py", line 34, in assertBlockDataContainerEqual
    if issubclass(container1.get_item(col).__class__, DataContainer):
AttributeError: 'ImageData' object has no attribute 'get_item'

======================================================================
ERROR: test_BlockDataContainer_with_SIRF_DataContainer_multiply (test_SIRF.TestSIRFCILIntegration)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/cil_sirf_test/test_SIRF.py", line 319, in test_BlockDataContainer_with_SIRF_DataContainer_multiply
    self.assertBlockDataContainerEqual(bdc , bdc1)
  File "/cil_sirf_test/testclass.py", line 40, in assertBlockDataContainerEqual
    self.assertBlockDataContainerEqual(container1.get_item(col),container2.get_item(col))
  File "/cil_sirf_test/testclass.py", line 34, in assertBlockDataContainerEqual
    if issubclass(container1.get_item(col).__class__, DataContainer):
AttributeError: 'ImageData' object has no attribute 'get_item'

======================================================================
ERROR: test_BlockDataContainer_with_SIRF_DataContainer_subtract (test_SIRF.TestSIRFCILIntegration)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/cil_sirf_test/test_SIRF.py", line 363, in test_BlockDataContainer_with_SIRF_DataContainer_subtract
    self.assertBlockDataContainerEqual(bdc , bdc1)
  File "/cil_sirf_test/testclass.py", line 40, in assertBlockDataContainerEqual
    self.assertBlockDataContainerEqual(container1.get_item(col),container2.get_item(col))
  File "/cil_sirf_test/testclass.py", line 34, in assertBlockDataContainerEqual
    if issubclass(container1.get_item(col).__class__, DataContainer):
AttributeError: 'ImageData' object has no attribute 'get_item'

----------------------------------------------------------------------
Ran 8 tests in 35.843s

FAILED (errors=4)

KMN

KrisThielemans commented 1 year ago

I had to look that abbrev up (although I guessed correctly)!

some of these seem to tie up with the experience of https://github.com/SyneRBI/SIRF-SuperBuild/issues/827#issuecomment-1636794782 and other difficulties experienced there.

No idea about the get_item problem

paskino commented 1 year ago

The get_item problem has been fixed as of https://github.com/TomographicImaging/CIL/pull/1481 or better https://github.com/TomographicImaging/CIL/pull/1502 which means that the minimum commit of CIL that you can use is 0ba2f8e2935db66ec3dfe8c88e407e8d0eb5609f which is exactly the hash that is in https://github.com/SyneRBI/SIRF-SuperBuild/blob/092f912fd76d36cd3995671809528db4c7cc1f33/version_config.cmake#L151

Notice that the bug is in test_SIRF.py rather than in CIL itself. This mess will disappear once we tag CIL.

casperdcl commented 1 year ago

You mean the CIL version installed (v23.0.1) has nothing to do with the version specified in version_config.cmake (0ba2f8e2)? Then what's the point of the latter?

KrisThielemans commented 1 year ago

Right, that'll be then because we get the conda version of CIL on docker. So, a few choices:

which one do you prefer?

KrisThielemans commented 1 year ago

You mean the CIL version installed (v23.0.1) has nothing to do with the version specified in version_config.cmake (0ba2f8e2)? Then what's the point of the latter?

see 2nd bullet above. version_config.cmake is used when using the SB with BUILD_CIL=ON. Not when doing conda install cil

paskino commented 1 year ago
  • give up on this test for 3.5
  • set BUILD_CIL=ON and somehow prevent to install CIL via conda on top (i.e. don't use SIRF-Exercises/environment.yml at all, or edit it first.). This is an interesting problem in itself: if you build a package yourself, how do you prevent conda messing it up.
  • make a CIL-dev conda version

This exists, the install command is about conda install cil -c ccpi/label/dev

  • release a new version of CIL

A new version of CIL should be out by the 27/9/23.

IMHO all this mess will go away once we tag and release CIL, so I suggest to wait until then.

KrisThielemans commented 1 year ago

ok. Of course, we will have the same problem later. I think what this is saying (no surprise): either use conda or use the SuperBuild, but don't try to mix.

casperdcl commented 1 year ago

long (cough, mid)-term solution would be pyproject.toml + https://github.com/scikit-build/scikit-build... no need for users to know about SuperBuild, CMake, or conda at all^1^3 😈 and no rebuilding already-existing packages :)

KrisThielemans commented 1 year ago

I have no idea really, but looks interesting.

We started with the SB because lots of things were not available on various systems (Gadgetron, ISMRMD, STIR, ...). That is less and less the case (although most of these are on conda only). Another reason was to have tight version control. I'm not sure how scikit-build would resolve external dependencies.

I'm happy to go with whatever makes our life easier (but serves the needs).

This needs a serious discussion of course.

casperdcl commented 1 year ago

fyi this PR is blocked on https://github.com/TomographicImaging/CIL/issues/1509 :)

KrisThielemans commented 1 year ago

that' s a long checklist!

casperdcl commented 1 year ago

post CIL 23.1.0 release, we still have one error

KrisThielemans commented 1 year ago

for easy reference

ERROR: test_TVdenoisingMR (test_SIRF.TestGradientMR_2D)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/sirfuser/cil_sirf_test/test_SIRF.py", line 255, in test_TVdenoisingMR
    res1 = TV.proximal(self.image1, tau=1.0)
  File "/opt/conda/lib/python3.10/site-packages/cil/optimisation/functions/TotalVariation.py", line 271, in proximal
    solution = self._fista_on_dual_rof(x, tau)
  File "/opt/conda/lib/python3.10/site-packages/cil/optimisation/functions/TotalVariation.py", line 303, in _fista_on_dual_rof
    p1 = self.gradient.range_geometry().allocate(None)
  File "/opt/conda/lib/python3.10/site-packages/cil/framework/BlockGeometry.py", line 56, in allocate
    containers = [geom.allocate(value, **kwargs) for geom in self.geometries]
  File "/opt/conda/lib/python3.10/site-packages/cil/framework/BlockGeometry.py", line 56, in <listcomp>
    containers = [geom.allocate(value, **kwargs) for geom in self.geometries]
  File "/opt/SIRF-SuperBuild/INSTALL/python/sirf/Gadgetron.py", line 526, in allocate
    tmp = value * numpy.ones(out.as_array().shape)
TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'

Relevant lines are https://github.com/SyneRBI/SIRF/blob/d3956345ec6e56421017a6aca34c73747dbae664/src/xGadgetron/pGadgetron/Gadgetron.py#L508-L527 Indeed, the argument None is not supported by Gadgetron.ImageData.allocate. I've created https://github.com/SyneRBI/SIRF/issues/1212.

Do we need this for AcquisitionData as well?

This seems to need a SIRF update then... (which means branching off from 3.5 to release 3.5.1).

paskino commented 1 year ago

This seems to need a SIRF update then... (which means branching off from 3.5 to release 3.5.1).

Let's then proceed with tagging the SB with 3.5 without this PR and tackle this later.

BTW, we are writing a list of residual incompatibilities between SIRF and CIL. We added to CIL a number of hacks, which we would like to remove, to overcome these incompatibilities, https://github.com/TomographicImaging/CIL/discussions/1516

KrisThielemans commented 1 year ago

Let's then proceed with tagging the SB with 3.5 without this PR and tackle this later.

I don't know... I guess you'd want the freshly released CIL in our version_config.cmake. That'd presumably make GHA fail (or at least, I'd hope it would. TBH, I don't understand why it didn't, unless the current CIL tag didn't include that test).

We could leave the CIL version as-is and then release 3.5.1 soon after which would presumably

In any case, these steps seems minor (aside from probably the last one...) and need to be completed ASAP. If they are indeed minor, then releasing 3.5.0 as-is might be double work. Up to you.

KrisThielemans commented 1 year ago

Of course, there are 2 issues on our milestone list that do need to be resolved. https://github.com/SyneRBI/SIRF-SuperBuild/milestone/14

casperdcl commented 11 months ago

ok so

should be gewd for SIRF 3.5.0?