DigitalSlideArchive / HistomicsTK

A Python toolkit for pathology image analysis algorithms.
https://digitalslidearchive.github.io/HistomicsTK/
Apache License 2.0
395 stars 117 forks source link

Logical error in function compute_fsd_features #1139

Open name-used opened 1 month ago

name-used commented 1 month ago

Version: bf0f18ecea93087dfbf541d8c3c63fed1de9a283 Commits on May 13, 2024 Branch master

There are two errors at file histomicstk.features.compute_fsd_features.py: line 77 and line 188.

  1. np.argwhere returns boundary points in coordinate order instead of natural order. Think about this: 12345 60007 89ABC The correct contour order is "123457CBA986" or any rotation of this contour such as "57CBA9861234". But the code in line 77 returns "123456789ABC". If rotate this matrix into: 57C 40B 30A 209 168 The returning becomes to "57C4B3A29168". There is no Fourier correlation between these two sequences. Therefore, the features based on this code are sensitive to any rotation.
  2. Curvature here is actually the angle of inclination. Even if the inclination angle of the first coordinate has been subtracted, this number is still a value in radians. The problem is that in radian calculation, 0==2pi. Due to the natural error in floating-point calculations, an almost zero radian value can easily be calculated as 2pi - eps. In contrast, the Fourier transform of real functions does not consider 0==2pi, which means that in similar transformation processes, values close to 0 will undergo jumps with each transformation. This undoubtedly does not meet the calculation purpose of contour features.

For Correction Suggestions:

  1. Use cv2.findContours
    # find boundaries
    # Bounds = np.argwhere(
    #     find_boundaries(lmask, mode='inner').astype(np.uint8) == 1,
    # )
    [Bounds], _ = cv2.findContours(
    (find_boundaries(lmask, mode='inner').astype(np.uint8) == 1).astype(np.uint8),
    # cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE,
    cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE,
    )
    Bounds = Bounds[:, 0, :]
  2. Calculate as complex coordinates
    Curvature = Curvature - Curvature[0]
    # calculate FFT
    # fX = np.fft.fft(Curvature ).T
    z = 1 * np.cos(Curvature) + 1j * np.sin(Curvature)
    fX = np.fft.fft(z).T
cooperlab commented 1 month ago

@FattanePourakpour can you take a look at this? This is clearly wrong - a function like trace_object_boundaries or opencv should be used to trace boundaries as recommended above (this would be bwboundaries if you are coming from Matlab).

FattanePourakpour commented 1 month ago

The current code uses np.argwhere, which scans the boundaries in raster format and doesn’t accurately trace them. I explored both cv2.findContours and trace_object_boundaries. While both functions trace boundaries correctly, I found that trace_object_boundaries is about 1.6 times faster. Another key difference is that the trace_object_boundaries function treats the starting point as the endpoint twice. This can easily be fixed by simply dropping the last coordinate.

np.argwhere

labels = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0],
                   [0, 0, 1, 1, 1, 1, 1, 0, 0],
                   [0, 0, 1, 1, 1, 1, 1, 0, 0],
                   [0, 0, 0, 0, 1, 1, 1, 0, 0],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=np.uint8)

X = find_boundaries(labels, mode='inner').astype(np.uint8)
Bounds= np.argwhere(X==1)
print(Bounds.T)
print('---------')
print(X)

--->>>
[[2 2 2 2 2 3 3 3 4 4 4]
 [2 3 4 5 6 2 3 6 4 5 6]]
---------
[[0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0]
 [0 0 1 1 1 1 1 0 0]
 [0 0 1 1 0 0 1 0 0]
 [0 0 0 0 1 1 1 0 0]
 [0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0]]

cv2.findContours

%%time
for _ in range(100000): # Run cv2.findContours to estimate the step time
  [Bounds], _ = cv2.findContours(
      (find_boundaries(labels, mode='inner').astype(np.uint8) == 1).astype(np.uint8),
      # cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE,
      cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE,
  )
  Bounds = Bounds[:, 0, :]

print(Bounds.T)
print('---------')
print(X)

--->>>
[[2 2 3 4 5 6 6 6 5 4 3]
 [2 3 3 4 4 4 3 2 2 2 2]]
---------
[[0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0]
 [0 0 1 1 1 1 1 0 0]
 [0 0 1 1 0 0 1 0 0]
 [0 0 0 0 1 1 1 0 0]
 [0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0]]
CPU times: user 26.5 s, sys: 86.4 ms, total: 26.6 s
Wall time: 26.9 s

trace_object_boundaries

%%time
for i in range(100000): #run trace_object_boundaries to estimate the step time
  Bounds = trace_object_boundaries(labels,
                              conn=8, trace_all=False,
                              x_start=None, y_start=None,
                              max_length=None,
                              simplify_colinear_spurs=False,
                              eps_colinear_area=0.01,
                              region_props=None)
  Bounds = np.stack([Bounds[0][0][:-1], Bounds[1][0][:-1]], axis=-1)

print(Bounds.T)
print('---------')
print(X)

--->>>
[[2 2 2 2 2 3 4 4 4 3 3]
 [2 3 4 5 6 6 6 5 4 3 2]]
---------
[[0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0]
 [0 0 1 1 1 1 1 0 0]
 [0 0 1 1 0 0 1 0 0]
 [0 0 0 0 1 1 1 0 0]
 [0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0]]
CPU times: user 16.9 s, sys: 39.7 ms, total: 16.9 s
Wall time: 17.2 s
FattanePourakpour commented 1 month ago

I think the second suggestion to use complex curvature is logical because it retains the directional information of the curvature, ensuring that changes in angle are accurately represented.

cooperlab commented 1 month ago

@FattanePourakpour What are your thoughts on the actual contour output? It seems like our trace is producing a more sensible result.

FattanePourakpour commented 1 month ago

I agree, the trace from trace_object_boundaries seems to capture the contours more accurately and matches the object’s boundaries better. Plus, in my tests, it performed faster than cv2.findContours.

cooperlab commented 1 month ago

I agree, the trace from trace_object_boundaries seems to capture the contours more accurately and matches the object’s boundaries better. Plus, in my tests, it performed faster than cv2.findContours.

At least in this case it seems to follow the most obvious path around the object.

Go ahead and prepare a pull request and we'll see if the tests pass. Some functions have stored expected values that they test against, so if that's the case, we will have to update these values.

FattanePourakpour commented 1 month ago

Sure, I’ll prepare the pull request and check for any tests with stored expected values. If needed, I’ll update those values.