When crawling directories in search of DICOM files, it is important to check for duplicates that are caused by symbolic links or simply caused because the files were listed multiple times in the input. However, the calls to "realpath" that are used to identify duplicates take up 30% of the total execution time.
Since duplicates are going to be a rare occurrence, the "realpath" check should only be done if there is a reason to suspect that a file has already been seen. For example, if a duplicate UID is found, then "realpath" could then be used to check if it is the same file. This would be much more efficient.
Note that there is also a "realpath" check at the directory level. This occurs seldom enough that there is no reason to change it.
Fixed in commit 6058219a. The semantics have changed slightly, since the new code uses "vtkDICOMFile::SameFile" (i.e. same file, including hard links) for files instead of "realpath".
When crawling directories in search of DICOM files, it is important to check for duplicates that are caused by symbolic links or simply caused because the files were listed multiple times in the input. However, the calls to "realpath" that are used to identify duplicates take up 30% of the total execution time.
Since duplicates are going to be a rare occurrence, the "realpath" check should only be done if there is a reason to suspect that a file has already been seen. For example, if a duplicate UID is found, then "realpath" could then be used to check if it is the same file. This would be much more efficient.
Note that there is also a "realpath" check at the directory level. This occurs seldom enough that there is no reason to change it.