AlexeyPechnikov / pygmtsar

PyGMTSAR (Python InSAR): Powerful and Accessible Satellite Interferometry
http://insar.dev/
BSD 3-Clause "New" or "Revised" License
418 stars 91 forks source link

[Feature]: Pros and cons for modified Goldstein adaptive filtering #41

Closed AlexeyPechnikov closed 1 year ago

AlexeyPechnikov commented 1 year ago

@SteffanDavies See below mapped the original interferogram phase (left) and the result of modified Goldstein adaptive filtering (right) with GMTSAR default psize=32:

image image

GMTSAR applies the filter to already decimated grids and for wavelength=400 (S1A_Stack_CPGF_T173 test case) we have:

gmt grdinfo amp1.grd 
...
amp1.grd: Grid file format: nf = GMT netCDF format (32-bit float), CF-1.7
amp1.grd: x_min: 0 x_max: 21568 x_inc: 16 name: x n_columns: 1348
amp1.grd: y_min: 0 y_max: 5484 y_inc: 4 name: y n_rows: 1371
...

It means for the grid resolution ~60m the filter 32x32 size applied. For the highest processing resolution ~15m we have ~120m spatial accuracy (1/4 filter size). That's often useless for infrastructure monitoring (like to a bridge crash). The question is, did you try to calculate without the Goldstean filter? Typically, PSI excludes the filter completely to save the details. Actually, the filter designed to highlight fringes and it might make cloud borders more strong increasing atmospheric effects.

SteffanDavies commented 1 year ago

@mobigroup I just opened a post asking exactly this question, I hadn't noticed this post. I will close the other and continue the discussion here.

Yes, I would like to try and skip filtering during interferogram generation. Is there any way to do this in the intf_parallel function?

Using a short filter (60m) still allows to highlight deformation in large features (retaining walls etc.) but makes it difficult to identify specific areas of subsidence within such features (neighboring pixels will be influenced by real movement in the area). This means that deformation behaviour in retaining walls will carry over into main roads where this is probably not expected.

AlexeyPechnikov commented 1 year ago

I'm working to replace the GMTSAR phasediff and phasefilt tools to make all filtering operations optional, as persistent scatterer interferometry does not apply Goldstein and Gaussian filters. The pure Python phasefilt replacement is ready: https://github.com/mobigroup/gmtsar/blob/pygmtsar/todo/phasefilt.ipynb, and phasediff still requires more work. It will be set to allow for wavelength=None, psize=None. By the way, I hope we will have the ability to do 3D phase unwrapping, because for now, the nearest pixels have values that are too close to each other to analyze them effectively. Additionally, GMTSAR processing does not use correct multi-looking, but it applies decimation instead, so the signal-to-noise ratio does not increase with downscaling. These problems cause GMTSAR results to be problematic in cases where we do not have real fringes. The filtering process actually produces strong atmospheric phase fringes, which invalidate the unwrapped results.

SteffanDavies commented 1 year ago

@mobigroup Please let me know when you have completed filter skipping so I can compare results, I'm looking forward to this!

AlexeyPechnikov commented 1 year ago

GMTSAR phase is too smoothed while it is still useful. But GMTSAR correlation looks invalid; it is calculated on Gaussian filtered amplitudes instead of filtering the calculated correlation.

1st plot - calculated phase “as is”, 2nd plot - GMTSAR filtered phase (Gaussian + Goldstein), 3rd plot - Goldstein filtered phase “as is”, 4th plot - the difference between 2nd and 3rd plots.

1st plot - GMTSAR filtered correlation, 2nd plot - calculated correlation “as is”, 3rd plot - the difference between 1st and 2nd plots.

AlexeyPechnikov commented 1 year ago

Let's compare Sentinel-1 Original vs GMTSAR Coherence. What does GMTSAR do internally? It applies a Gaussian filter of 200 meters (by default) to remove speckle noise and a Goldstein adaptive filter (with a 32x32 pixel window by default) to highlight fringes. Instead of multi-looking, it uses decimation, simply selecting the first pixel from every four to downscale by a factor of 16. This works well for instances where there are many fringes over a large area, such as in the case of an earthquake. For slow deformation and small objects, there are side effects — deformation phase and coherence become distorted and displaced, and atmospheric delays are converted by the filters into strong fringes. Additionally, GMTSAR uses another Gaussian filter to further smooth the coherence map, which completely shifts the positions of high coherence pixels. It seems that the post-processing was adapted for older, low-resolution, and low-quality satellites, and as a result, significantly compromises the quality of Sentinel-1 data. Multi-looking almost always works well; however, the combination of decimation and filters is very limited in its application. The original coherence can be used in conjunction with Persistent Scatterers (PS), but the triple-filtered GMTSAR output cannot. Rethinking the processing may allow for the use of corner reflectors and open a path to integrating PS interferometry features. Actually, there is one more problem to resolve. GMTSAR stores the co-registered image stack in a format that makes fast and effective PS processing impossible. However, the solution is apparent — we can adopt the same stacking technology that PyGMTSAR already employs for time-series analysis, such as SBAS and trend analysis.

SteffanDavies commented 1 year ago

Let's compare Sentinel-1 Original vs GMTSAR Coherence. What does GMTSAR do internally? It applies a Gaussian filter of 200 meters (by default) to remove speckle noise and a Goldstein adaptive filter (with a 32x32 pixel window by default) to highlight fringes. Instead of multi-looking, it uses decimation, simply selecting the first pixel from every four to downscale by a factor of 16. This works well for instances where there are many fringes over a large area, such as in the case of an earthquake. For slow deformation and small objects, there are side effects — deformation phase and coherence become distorted and displaced, and atmospheric delays are converted by the filters into strong fringes. Additionally, GMTSAR uses another Gaussian filter to further smooth the coherence map, which completely shifts the positions of high coherence pixels. It seems that the post-processing was adapted for older, low-resolution, and low-quality satellites, and as a result, significantly compromises the quality of Sentinel-1 data. Multi-looking almost always works well; however, the combination of decimation and filters is very limited in its application. The original coherence can be used in conjunction with Persistent Scatterers (PS), but the triple-filtered GMTSAR output cannot. Rethinking the processing may allow for the use of corner reflectors and open a path to integrating PS interferometry features. Actually, there is one more problem to resolve. GMTSAR stores the co-registered image stack in a format that makes fast and effective PS processing impossible. However, the solution is apparent — we can adopt the same stacking technology that PyGMTSAR already employs for time-series analysis, such as SBAS and trend analysis.

From this example it is easy to see the "spreading" effect that is sometimes evident in outputs due to filtering. By this I mean that point behaviour, in local contexts (such as road retaining walls, targets close to water or vegetation, or even targets in close proximity) will share similar time-series behaviour when in fact they should have independent behaviors from eachother. Also, this could mean that coherence mask will spread higher coherence to lower coherence pixels, causing points to appear in areas that are not masked by the landmask or areas with vegetation. On the other hand, otherwise highly coherent targets will get dampened out if they are isolated in a low coherence area, this is visible in some of the roads given in your example.

I was wondering if it would make sense to have a two pass approach, where an initial low and high coherence map (ex. «0.2 & »0.6) is generated from unfiltered phase and coherence (left), the low and high coherence pixels are removed from the filtered product (right), the holes of the filtered product are interpolated and then the original low and high coherence pixel values are imposed on the interpolated and filtered product. This would provide a clear boundary between PS and non-PS while filtering quasi-PS (bare soil etc.).

Just a thought.

AlexeyPechnikov commented 1 year ago

I think that’s better to detect persistent scatters than emulate them using correlation which depends of the both amplitudes and the amplitude of the interferogram. GMTSAR smoothing the output correlation; the original correlation (see left plot above) is used for adaptive filtering, but the output correlation (see right plot above) is calculated on the double-filtered phase and filtered again. In this way, GMTSAR correlation is not suitable to detect small (infrastructure) objects, and it’s affected by topography a lot, and the original correlation is not a well measure for easily sloping mountains and similar surfaces.

I plan to add persistent scatterers calculation on the original grids and provide output coarse grids same as the interferogram. We could have the number of the scatterers in the multi-looking grid cell above user-specified threshold (it requires the re-calculations for different thresholds), the best measured amplitude dispersion index per cell and so on. 3D unwrapping and other features are more tricky; the persistent scatterers detection can be added to PyGMTSAR as the first milestone.

AlexeyPechnikov commented 1 year ago

More accurate processing results for:

Wavelength=200 Wavelength=400
No Goldstein Filter
Goldstein psize=32

My assumption looks correct and we don't need to apply Goldstein filter when we do not expect a lot of fringes. The results built using a recent version PyGMTSAR on GitHub where phasefilt/make_gaussian_filter/conv GMTSAR binary tools replaced by my lazy Python realisation.

SteffanDavies commented 1 year ago

More accurate processing results for:

Wavelength=200 Wavelength=400 No Goldstein Filter
Goldstein psize=32

My assumption looks correct and we don't need to apply Goldstein filter when we do not expect a lot of fringes. The results built using a recent version PyGMTSAR on GitHub where phasefilt/make_gaussian_filter/conv GMTSAR binary tools replaced by my lazy Python realisation.

From recent experience trying different wavelengths on local infrastructure, smaller filter wavelength causes general increase in amplitude of subsidence while having little effect on pixel time series behaviour, i.e. amplification of subsidence values.

In your example, not applying Goldstein filtering helps preserve small area behaviour. Notice how in the centre there is a small red patch that disappears in last cumulative image of both 400m no Goldstein and 200/400m Goldstein but is preserved in 200m no Goldstein.

I have been using 60m filter for 15m resolution.

AlexeyPechnikov commented 1 year ago

Small wavelength 100m for this case disrupts SNAPHU unwrapping:

Cumulative Model LOS Displacement in Geographic Coordinates AOI,  mm

I think the subsidence areas are disconnected here and we see 2pi ambiguity between the areas.

SteffanDavies commented 1 year ago

Small wavelength 100m for this case disrupts SNAPHU unwrapping:

Cumulative Model LOS Displacement in Geographic Coordinates AOI, mm

I think the subsidence areas are disconnected here and we see 2pi ambiguity between the areas.

This is very interesting, is snaphu disrupted with 100m + Goldstein?

AlexeyPechnikov commented 1 year ago

Yes, Goldstein does not help for small wavelength. The phase output for 100m wavelength with and without Goldstein looks very similar and SNAPHU fails for the both.

100m + Goldstein: 100m + Goldstein

100m no Goldstein: 100m no Goldstein

The problem is here are too much noisy pixels and SNAPHU cannot detect the right ones for unwrapping. Maybe persistent scatterers mask instead of the used stack coherence mask can resolve the issue. Actually, I’ve already added lazy reader for the source .SLC files and we can analyse the full images stack to detect the PS pixels. You can try it if you have the time.

AlexeyPechnikov commented 1 year ago

@SteffanDavies The updated interferogram processing using SBAS.intf_parallel() function works ~3 times faster on MacOS Apple Silicon. Please confirm on Linux if you can on the recent GitHub version. Note: I don't recommend use it for production as it is not well tested now but it will not be released on PyPI soon.

AlexeyPechnikov commented 1 year ago

Without Goldstein filter and 100m Gaussian filter and without additional coherence filter, PS-weighted interferograms produce more detailed PS-SBAS results:

30m_ps

So yes, Goldstein filter is very optional and is not applicable for most of the cases.

SteffanDavies commented 1 year ago

I will be testing this week. How do I skip filtering? psize=None?

Do you have a notebook for reference in the above case?

AlexeyPechnikov commented 1 year ago

Just use psize=None to disable Goldstein filter. By the way, PyGMTSAR uses GMTSAR compatible Gaussian filter definition and wavelength=50 (m) filter means the output resolution about 15m. Anyway, that’s possible to disable Gaussian filter too but speckle noise is high for this case.

SteffanDavies commented 1 year ago

Testing psise=None on Ubuntu:

Error in Unwrapping.


TypeError Traceback (most recent call last) File :11

File ~/.local/lib/python3.10/site-packages/pygmtsar/SBAS_unwrap_snaphu.py:138, in SBAS_unwrap_snaphu.unwrap(self, pair, threshold, conf, func, mask, conncomp, phase, corr, interactive, chunksize, debug) 133 os.remove(tmp_file) 135 # prepare SNAPHU input files 136 # NaN values are not allowed for SNAPHU phase input file 137 # interpolate when exist valid values around and fill zero pixels far away from valid ones --> 138 self.nearest_grid(phase.where(binmask)).fillna(0).astype(np.float32).values.tofile(phase_in) 139 # NaN values are not allowed for SNAPHU correlation input file 140 # just fill NaNs by zeroes because the main trick is phase filling 141 corr.where(binmask).fillna(0).astype(np.float32).values.tofile(corr_in)

File ~/.local/lib/python3.10/site-packages/xarray/core/dataarray.py:732, in DataArray.values(self) 723 @property 724 def values(self) -> np.ndarray: 725 """ 726 The array's data as a numpy.ndarray. 727 (...) 730 type does not support coercion like this (e.g. cupy). 731 """ --> 732 return self.variable.values

File ~/.local/lib/python3.10/site-packages/xarray/core/variable.py:614, in Variable.values(self) 611 @property 612 def values(self): 613 """The variable's data as a numpy.ndarray""" --> 614 return _as_array_or_item(self._data)

File ~/.local/lib/python3.10/site-packages/xarray/core/variable.py:314, in _as_array_or_item(data) 300 def _as_array_or_item(data): 301 """Return the given values as a numpy array, or as an individual item if 302 it's a 0d datetime64 or timedelta64 array. 303 (...) 312 TODO: remove this (replace with np.asarray) once these issues are fixed 313 """ --> 314 data = np.asarray(data) 315 if data.ndim == 0: 316 if data.dtype.kind == "M":

File ~/.local/lib/python3.10/site-packages/dask/array/core.py:1701, in Array.array(self, dtype, kwargs) 1700 def array(self, dtype=None, kwargs): -> 1701 x = self.compute() 1702 if dtype and x.dtype != dtype: 1703 x = x.astype(dtype)

File ~/.local/lib/python3.10/site-packages/dask/base.py:310, in DaskMethodsMixin.compute(self, kwargs) 286 def compute(self, kwargs): 287 """Compute this dask collection 288 289 This turns a lazy Dask collection into its in-memory equivalent. (...) 308 dask.compute 309 """ --> 310 (result,) = compute(self, traverse=False, **kwargs) 311 return result

File ~/.local/lib/python3.10/site-packages/dask/base.py:595, in compute(traverse, optimize_graph, scheduler, get, args, kwargs) 592 keys.append(x.dask_keys()) 593 postcomputes.append(x.dask_postcompute()) --> 595 results = schedule(dsk, keys, kwargs) 596 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)])

File ~/.local/lib/python3.10/site-packages/distributed/client.py:3227, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs) 3225 should_rejoin = False 3226 try: -> 3227 results = self.gather(packed, asynchronous=asynchronous, direct=direct) 3228 finally: 3229 for f in futures.values():

File ~/.local/lib/python3.10/site-packages/distributed/client.py:2361, in Client.gather(self, futures, errors, direct, asynchronous) 2359 else: 2360 local_worker = None -> 2361 return self.sync( 2362 self._gather, 2363 futures, 2364 errors=errors, 2365 direct=direct, 2366 local_worker=local_worker, 2367 asynchronous=asynchronous, 2368 )

File ~/.local/lib/python3.10/site-packages/distributed/utils.py:351, in SyncMethodMixin.sync(self, func, asynchronous, callback_timeout, *args, *kwargs) 349 return future 350 else: --> 351 return sync( 352 self.loop, func, args, callback_timeout=callback_timeout, **kwargs 353 )

File ~/.local/lib/python3.10/site-packages/distributed/utils.py:418, in sync(loop, func, callback_timeout, *args, **kwargs) 416 if error: 417 typ, exc, tb = error --> 418 raise exc.with_traceback(tb) 419 else: 420 return result

File ~/.local/lib/python3.10/site-packages/distributed/utils.py:391, in sync..f() 389 future = wait_for(future, callback_timeout) 390 future = asyncio.ensure_future(future) --> 391 result = yield future 392 except Exception: 393 error = sys.exc_info()

File ~/.local/lib/python3.10/site-packages/tornado/gen.py:767, in Runner.run(self) 765 try: 766 try: --> 767 value = future.result() 768 except Exception as e: 769 # Save the exception for later. It's important that 770 # gen.throw() not be called inside this try/except block 771 # because that makes sys.exc_info behave unexpectedly. 772 exc: Optional[Exception] = e

File ~/.local/lib/python3.10/site-packages/distributed/client.py:2224, in Client._gather(self, futures, errors, direct, local_worker) 2222 exc = CancelledError(key) 2223 else: -> 2224 raise exception.with_traceback(traceback) 2225 raise exc 2226 if errors == "skip":

File ~/.local/lib/python3.10/site-packages/dask/optimization.py:992, in call() 990 if not len(args) == len(self.inkeys): 991 raise ValueError("Expected %d args, got %d" % (len(self.inkeys), len(args))) --> 992 return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

File ~/.local/lib/python3.10/site-packages/dask/core.py:151, in get() 149 for key in toposort(dsk): 150 task = dsk[key] --> 151 result = _execute_task(task, cache) 152 cache[key] = result 153 result = _execute_task(out, cache)

File ~/.local/lib/python3.10/site-packages/dask/core.py:121, in _execute_task() 117 func, args = arg[0], arg[1:] 118 # Note: Don't assign the subtask results to a variable. numpy detects 119 # temporaries by their reference count and can execute certain 120 # operations in-place. --> 121 return func(*(_execute_task(a, cache) for a in args)) 122 elif not ishashable(arg): 123 return arg

File ~/.local/lib/python3.10/site-packages/dask/utils.py:73, in apply() 42 """Apply a function given its positional and keyword arguments. 43 44 Equivalent to func(*args, **kwargs) (...) 70 >>> dsk = {'task-name': task} # adds the task to a low level Dask task graph 71 """ 72 if kwargs: ---> 73 return func(*args, *kwargs) 74 else: 75 return func(args)

File ~/.local/lib/python3.10/site-packages/pygmtsar/datagrid.py:407, in func() 404 return grid 406 # crop full grid subset to search for missed values neighbors --> 407 ymin = y.min()-scaleydistance-1 408 ymax = y.max()+scaleydistance+1 409 xmin = x.min()-scalex*distance-1

TypeError: can't multiply sequence by non-int of type 'float'

AlexeyPechnikov commented 1 year ago

Is it for a single grid with debug=True option?

SteffanDavies commented 1 year ago

Is it for a single grid with debug=True option?

I think it may be related to datagrid.chunksize="auto"

AlexeyPechnikov commented 1 year ago

We shouldn’t use ‘auto’ chunks because Dask cannot select the right size for us. Also, this value is used for some internal calculations so it should be a positive number.