PMEAL / porespy

A set of tools for characterizing and analying 3D images of porous materials
https://porespy.org
MIT License
309 stars 100 forks source link

Very different network extraction results between versions 1.3.1 and 2.0.2 #643

Closed avlmachado closed 2 years ago

avlmachado commented 2 years ago

I've tested the SNOW algorithm in the recent version (snow2) and I obtained different results from a previous version.

I used the popular Berea sample from ICL available in: https://www.imperial.ac.uk/media/imperial-college/faculty-of-engineering/earth-science-and-engineering/recovered-files/33505696.zip

Using the SNOW algorithm with PoresPy v1.3.1, the network has 5217 pores and 9445 throats. Using snow2 with PoresPy version 2.0.2, the resulted network has 12362 pores and 22087 throats.

I'm not sure if my script has a setup error. The script for testing both versions is:

import openpnm as op
import porespy as ps
import numpy as np
import matplotlib.pyplot as plt
import time

timeZero = time.time()

im = np.fromfile('Image/Berea.raw', dtype='bool', sep="").reshape(400,400,400)

im = ~np.array(im, dtype=bool)
plt.imshow(im[100,:,:],cmap='gray')
plt.show()

porosity = ps.metrics.porosity(im)
print(porosity)

timeZero = time.time()

# v1.3.1
snow_output = ps.networks.snow(im,
                   voxel_size=5.435e-6)
prj = op.io.PoreSpy.import_data(snow_output)

# v2.0.2
#snow_output = ps.networks.snow2(im,
#                   voxel_size=5.435e-6)
#prj = op.io.PoreSpy.import_data(snow_output.network)

pn = prj.network

print('Elapsed time during network extraction (min)', (time.time()-timeZero)/60)
print('Number of pores before trimming: ', pn.Np)
h = pn.check_network_health()
op.topotools.trim(network=pn, pores=h['trim_pores'])
print('Number of pores after trimming: ', pn.Np)
print(pn)

I'd like to use the recent version in order to take advantage of parallelization.

jgostick commented 2 years ago

We are investigating. It seems that the trim_saddle_points function was tweaked between versions, to reduce the number of "maximum iterations reached" incidents. I was aware that the number of points was slightly different between versions, but attributed it to the improved logic. It seems that the image you're using is extremely sensitive to this. Stay posted.

avlmachado commented 2 years ago

Thank you @jgostick for the answer.
If is there anything else I can do to help. I'm not experienced in the details of the method, but maybe I could test with other samples. I extracted a couple of slices to comparing the results in ParaView, and, indeed, the connections are highly segmented. I'm no sure if there is a great impact in other properties, such as permeability. poreSpy_v131_v202

jgostick commented 2 years ago

Interesting pictures. They are pretty similar actually, just more small pores dangling off the sides. I'm still working on this, but one thing I plan to investigate is the resolution effects, since the image you're using is not exactly super high resolution. The watershed method does not like too many small pores. I will try zooming the image 2-3x and seeing how things go.

jgostick commented 2 years ago

I zoomed the iamge 3x (making it 1200 cubed), then grabbed 400-cubed piece from the corner. With this higher resolution, I get the following:

V2.0.1

Number of peaks before gaussian blur: 3023 Number of peaks after gaussian blur: 1225 Number of peaks after trimming non-blurred: 2234 Number of peaks after trimming blurred: 1214

V1.3.1

Number of peaks before gaussian blur: 3023 Number of peaks after gaussian blur: 1225 Number of peaks after trimming non-blurred: 796 Number of peaks after trimming blurred: 908

The V2 yeilds more pores (peaks become pores) but the difference is much smaller than you saw on the original image. V2 seems 50% more pores, compred to a 300% increase you reported. I will keep investigating.

jgostick commented 2 years ago

Having dug into the trim_saddle_points function I can report the following:

The main question now is how to proceed. I will keep working on this since its a pretty important issue.

jgostick commented 2 years ago

I finally found the time to look into this properly. The bottom line is that the peak finding in V1.3 was too agressive, so we rewrote the funciton in V2 to be more conservative. I have combed over all the the steps in the snow process, including finding initial peaks, trimming saddle points, and trimming nearby peaks. I am feeling good about them all now, and have created a PR where I have applied lots of maintenance fixes. Nothing too major compared to the previous V2 work flow. However, I don't want to go back to the V1.3 version, so I have made a compromise: it is now possible to pass your own peaks into the snow2 algorithm as well as snow_partitioning and snow_partitioning_n. Along with this I have also re-added the function from V1.3 to the filters module and called it trim_saddle_points_legacy. So, if you really want to reproduce the old behavior you can generate the peaks in your script and pass them into snow2. One downside: I was not able to enable the passing of peaks to snow_partioning_parallel. There is too much dask magic going on in that function that was written by my former PhD student, so I'm not quite sure how to adapt it to accept peaks. Basically, moving forward we just strongly recommend to use snow2 as is, even though it produces more peaks. On a side note, the ability to pass peaks to snow2 allows for people to produce other custom peak finding functions, which will hopefully be of use.