InsightSoftwareConsortium / ITK

Insight Toolkit (ITK) -- Official Repository. ITK builds on a proven, spatially-oriented architecture for processing, segmentation, and registration of scientific images in two, three, or more dimensions.
https://itk.org
Apache License 2.0
1.37k stars 660 forks source link

itkParallelSparseFieldLevelSetImageFilterTest read/write race with GetPixel() / SetPixel() #4708

Open seanm opened 4 weeks ago

seanm commented 4 weeks ago

Running the itkParallelSparseFieldLevelSetImageFilterTest test with TSan:

WARNING: ThreadSanitizer: data race (pid=68287)
  Read of size 1 at 0x00010b524516 by thread T12:
    #0 itk::ParallelSparseFieldLevelSetImageFilter<itk::Image<float, 3u>, itk::Image<float, 3u>>::ThreadedUpdateActiveLayerValues(double const&, itk::SparseFieldLayer<itk::ParallelSparseFieldLevelSetNode<itk::Index<3u>>>*, itk::SparseFieldLayer<itk::ParallelSparseFieldLevelSetNode<itk::Index<3u>>>*, unsigned int) itkParallelSparseFieldLevelSetImageFilter.hxx:1556 (ITKLevelSetsTestDriver:arm64+0x1002f7048)
    #1 itk::ParallelSparseFieldLevelSetImageFilter<itk::Image<float, 3u>, itk::Image<float, 3u>>::ThreadedApplyUpdate(double const&, unsigned int) itkParallelSparseFieldLevelSetImageFilter.hxx:1374 (ITKLevelSetsTestDriver:arm64+0x1002c5bac)
    #2 itk::ParallelSparseFieldLevelSetImageFilter<itk::Image<float, 3u>, itk::Image<float, 3u>>::Iterate()::'lambda1'(unsigned long)::operator()(unsigned long) const itkParallelSparseFieldLevelSetImageFilter.hxx:1210 (ITKLevelSetsTestDriver:arm64+0x1002e91cc)
    #3 decltype(std::declval<itk::ParallelSparseFieldLevelSetImageFilter<itk::Image<float, 3u>, itk::Image<float, 3u>>::Iterate()::'lambda1'(unsigned long)&>()(std::declval<unsigned long>())) std::__1::__invoke[abi:ue170006]<itk::ParallelSparseFieldLevelSetImageFilter<itk::Image<float, 3u>, itk::Image<float, 3u>>::Iterate()::'lambda1'(unsigned long)&, unsigned long>(itk::ParallelSparseFieldLevelSetImageFilter<itk::Image<float, 3u>, itk::Image<float, 3u>>::Iterate()::'lambda1'(unsigned long)&, unsigned long&&) invoke.h:340 (ITKLevelSetsTestDriver:arm64+0x1002e9134)
    <snip>

  Previous write of size 1 at 0x00010b524516 by thread T3:
    #0 itk::Image<signed char, 3u>::SetPixel(itk::Index<3u> const&, signed char const&) itkImage.h:211 (ITKLevelSetsTestDriver:arm64+0x1000978ac)
    #1 itk::ParallelSparseFieldLevelSetImageFilter<itk::Image<float, 3u>, itk::Image<float, 3u>>::ThreadedUpdateActiveLayerValues(double const&, itk::SparseFieldLayer<itk::ParallelSparseFieldLevelSetNode<itk::Index<3u>>>*, itk::SparseFieldLayer<itk::ParallelSparseFieldLevelSetNode<itk::Index<3u>>>*, unsigned int) itkParallelSparseFieldLevelSetImageFilter.hxx:1583 (ITKLevelSetsTestDriver:arm64+0x1002f72a4)
    #2 itk::ParallelSparseFieldLevelSetImageFilter<itk::Image<float, 3u>, itk::Image<float, 3u>>::ThreadedApplyUpdate(double const&, unsigned int) itkParallelSparseFieldLevelSetImageFilter.hxx:1374 (ITKLevelSetsTestDriver:arm64+0x1002c5bac)
    #3 itk::ParallelSparseFieldLevelSetImageFilter<itk::Image<float, 3u>, itk::Image<float, 3u>>::Iterate()::'lambda1'(unsigned long)::operator()(unsigned long) const itkParallelSparseFieldLevelSetImageFilter.hxx:1210 (ITKLevelSetsTestDriver:arm64+0x1002e91cc)
    <snip>

We see from the above that one thread is writing to 0x00010b524516 and another is simultaneously reading from the same address.

Here is the relevant code, NB the fire emojis:

    else if (new_value < LOWER_ACTIVE_THRESHOLD)
    {
      // This index will move DOWN into a negative (inside) layer.
      // First check for active layer neighbors moving in the opposite direction
      flag = false;
      for (unsigned int i = 0; i < Neighbor_Size; ++i)
      {
        // READ🔥🔥🔥🔥
        if (m_StatusImage->GetPixel(centerIndex + m_NeighborList.GetNeighborhoodOffset(i)) == m_StatusActiveChangingUp)
        {
          flag = true;
          break;
        }
      }
      if (flag)
      {
        ++layerIt;
        continue;
      }

      rms_change_accumulator += itk::Math::sqr(static_cast<float>(new_value - centerValue));
      // update the value of the pixel
      m_OutputImage->SetPixel(centerIndex, new_value);

      // Now remove this index from the active list.
      release_node = layerIt.GetPointer();
      ++layerIt;

      m_Data[ThreadId].m_Layers[0]->Unlink(release_node);
      m_Data[ThreadId].m_ZHistogram[release_node->m_Index[m_SplitAxis]] =
        m_Data[ThreadId].m_ZHistogram[release_node->m_Index[m_SplitAxis]] - 1;

      // now add release_node to status down list
      DownList->PushFront(release_node);

      // WRITE🔥🔥🔥🔥
      m_StatusImage->SetPixel(centerIndex, m_StatusActiveChangingDown);
    }

To me, this indeed looks like a buggy race. But I don't know this code at all.

seanm commented 4 weeks ago

@blowekamp maybe this is similar to https://github.com/InsightSoftwareConsortium/ITK/issues/3031 ?

blowekamp commented 4 weeks ago

The other was a logic bug in the algorithm, which required a good amount of effort in figuring out details of the method. This is a different algorithm.

I could look into the SLIC TSan issues, as I contributed that class.

seanm commented 4 weeks ago

I could look into the SLIC TSan issues, as I contributed that class.

I triaged them last night, and will write up a ticket for them shortly.