bthyreau / hippodeep_pytorch

quick segmentation of the hippocampus from T1 MRI images. (pytorch version)
23 stars 11 forks source link

Difference in volume estimation between non-pytorch hippodeep and pytorch hippodeep #2

Open NinjMenon opened 3 years ago

NinjMenon commented 3 years ago

Hello!

Thank you developing this awesome tool :) It's been a game changer for us to get hippocampal volume estimates so quickly!

I had a question about the difference between the older version of hippodeep and this new pytorch version in terms of calculating the volumes. Previously, we used to get whole number volumes with the Theano based hippodeep:

Left Hippocampus = 2744.00 Right Hippocampus = 3301.00

This makes sense, as our T1s are all 1 mm isotropic.

However, pytorch hippodeep reports fractional volumes for the same data: Left Hippocampus = 2676.3678921568626 Right Hippocampus = 3214.0019607843137

Is this because the older version used ANTs to load and manipulate the image, in which case the pixdim fields would have been used to determine voxel sizes? We see the converse happening with eTIV where the older version would report fractional eTIV, and the newer pytorch version reports whole number volumes.

Theano hippodeep eTIV = 1538766.31 Pytorch hippodeep eTIV = 1547827

Just wanted to know if you had any thoughts on this? Thanks!

bthyreau commented 3 years ago

Hi, Thanks !

Actually, both methods (Theano and Pytorch) can definitely not reach that kind of precision, so i would say that any digits after the dot (and a few before, too) is not significant. Even the test-retest variation is much bigger.

Technically, there are several sources of differences between the two versions which can affect the results, the biggest ones being interpolation + thresholding. Internally, the segmentation is conducted in MNI-space (in probabilistic form: the mask can contains values between 0 and 1 around borders) and the result is back-projected to native space, with linear interpolation. How to account for these numerous "softs" border voxels can have a large impact on the final results (also for eTIV). I think the pytorch script does a bit of a better job than the old theano version in dealing with this, integrating over all voxels whose probability is above 50%, and clamping all noisy high-probability values to exact 1.

As for the eTIV, i just noticed that it is indeed displayed as integer, for no particular reason, in the output text. Actually, the value stored in the output csv is still float.

NinjMenon commented 3 years ago
Thank you! That helps a lot. We were getting large test-retest variations with the Theano method. Here is a snippet: Run Left Hippocampus Right Hippocampus eITV
1 2811.00 3247.00 1538766.31
2 2754.00 3291.00 1538766.31
3 2738.00 3279.00 1538766.31
4 2744.00 3301.00 1538766.31
5 2747.00 3306.00 1538766.31
6 2747.00 3292.00 1538766.31
7 2750.00 3290.00 1538766.31
8 2741.00 3294.00 1538766.31
9 2763.00 3283.00 1538766.31
10 2737.00 3289.00 1538766.31
11 2737.00 3288.00 1538766.31
12 2720.00 3292.00 1538766.31

But with the pytorch method, we are getting consistent results in the test-rest case. Thanks for the details about the algorithm!

bthyreau commented 3 years ago

I see - I didn't notice the impact on the test-retest variance, thanks for this example data !

It's possible that the better post-processing also yields more consistent test-retest volumes.

But I'm surprised that the eTIV column is strictly constant, which is not expected if the input comes from different acquisitions. Did you actually perform your own manual registration+resampling before running the test-retest experiment ? That would of course be very reasonable for a longitudinal analysis.

The very original hippodeep tool used to call an external tools (flirt or antsRegistration) to perform the affine alignment but this had been later replaced by a dedicated function. I note that antsRegistration is not always deterministic (especially with multiple threads), so even very similar images could output slightly different realigned output. In that case, the apparent improvement in test-retest metric would not really be due to an increase in segmentation accuracy, but to more consistent MNI alignement.

(Anyway, I'd rather not bother anymore with the old Theano version, but shall you requires it, e.g. for back-compatibility with a previous experiment, i could have a look when i have time.)

NinjMenon commented 3 years ago

We're happy to use newer version - lesser dependencies are always a great thing! For the example data which is from the Theano version, we ran the same scan with 1 mm isotropic voxels and an extent of 256x256x170 in subject space, 12 times. We didn't do any manual registration or resampling. I have 2 follow up questions:

  1. Is your recommendation that we do manual registration + resampling if we want to use the measures in a longitudinal manner, even with the new pytorch method?
  2. I went over your previous response and I am still a little unclear on one thing - our scans are 1 mm isotropic so all estimated volumes should be whole numbers, since the volume becomes just a count of all voxels. If you're thresholding the soft border by 50%, then the only difference would be in the number of voxels included. The volume of all included voxels should still be whole number, because each voxel is 1 cubic mm in volume. I'm a little confused about this part - would you be able to clarify a little bit more? Here also, the method I am referring to is the pytorch method. Thank you so much!
bthyreau commented 3 years ago

Sure.

  1. I suppose It can not hurt to do manual realignment of all longitudinal timepoints with a separate tool - but i have not carefully evaluated the variance introduced by of the alignment model of hippodeep. I'm a bit surprise that the eTIV models from your data above have such a high consistency. This may be because the eTIV is computed at lower resolution, so small variations may be smoothed out. Or maybe because the subject didn't move between all separate acquisitions. In that case, the realignement will be very consistent too, so external registration won't be necessary.

  2. The soft-border is thresholded (all values below 50% are set to 0, mostly to remove noise actually), but not binarized. That means a value of, say, 80% will account for 0.8 voxels in the volume calculation. A justification for this is based on an interpretation of such voxels as partially-filled volumes. (Some other software also do that (e.g. SPM))

NinjMenon commented 3 years ago

Hi @bthyreau, excellent thank you! Point 2 makes complete sense. To clarify from my end for point 1, we just ran hippodeep on the same scan (same input file) 12 times. These were not 12 different acquisitions of the same person, it was literally just us running the code on the same input file. Sorry if that was confusing!