alomax / NonLinLoc

Probabilistic, Non-Linear, Global-Search Earthquake Location in 3D Media
http://www.alomax.net/nlloc/docs
GNU General Public License v3.0
88 stars 32 forks source link

usage question re: LOCSEARCH #17

Closed filefolder closed 1 year ago

filefolder commented 2 years ago

Hi-- I wonder if I could ask for some clarification with the LOCSEARCH parameter in OctTree mode. This may not be the place for it so feel free to direct me somewhere else.

I am observing "gridding" of my epicenters at very small ~1 km scales and am wondering what parameter (if any) I can adjust to mitigate this, and/or which would be most efficient resource-wise.
gridding

Currently (producing the figure above) I am setting the parameter as below with a 3D velocity model gridded at 346x276x61 nodes and 5x5x5 km "real" increments. It is quite slow at these settings already so I am trying to be wise about what I should change.

LOCSEARCH OCT 86 69 15 0.0005 200000 5000

I have chosen the initial grid to be about 25% of the model, but the rest (minnodesize=.0005, maxnumnodes=200000, numscatter=5000) I have essentially just guessed. Any help or advice towards a logical approach to this would is appreciated!

alomax commented 2 years ago

Hi,

I suppose you are asking about the smaller scale gridding, which looks like ~1km if the colored cells are 5x5km. A residual gridding after oct-tree search often means the events are poorly constrained and the corresponding location PDF is very extensive so the search does not converge well. This could be due to poor quality arrival picks, or to unrealistic pick or travel-time uncertainties.

But I do have suggestions for the basic settings. Usually I use maxnumnodes 25-50k; 200k will definitely be slow. Since the init_num_cells_x, init_num_cells_y, init_num_cells_z values are usually not critical (unless the model or location pdf is highly complex on small scales), I generally set them with a target of ~10k total init cells (these cells define an initial oct-tree gridding over the full LOCGRID, not the fraction of the LOCGRID that will be used initially). So for your case these values might be as low as 35 28 6 if you are locating (LOCGRID) over your full 3D model grid (346x276x61). 35x28x6 = 5880 is << 25-50k maxnumnodes.

Maybe you are also concerned about the larger scale lineations - these are a bit confusing. The near vertical one is not perfectly vertical, so, instead of location gridding, perhaps reflects a sharp boundary in the velocity model or a surface where travel times from very different ray paths are equal, giving a discontinuity in the gradient of travel-time.

Have you also checked that your model is correctly transformed to the NLL grid format and that the travel-times fields are reasonable? You could simplify your model to a horzontal layered model but still implement it in 3D to test and verify your model+travel-time+location settings.

I hope this helps some, but please as further questions if needed. Also it is helpful if you include some or all of your control file.

Anthony

filefolder commented 2 years ago

Thanks, this is very helpful.

Let me expand a bit-- all of the below figures have the same dimensions (.1 x .1 deg at a latitude of -33, so ~10km by 10km-- twice the size of the velocity model grid resolution which looks more or less smooth / fine to me. All figures have the same number of earthquakes (~1550) and the average uncertainty of these events is ~3km for X/Y/Z average RMS ~0.35 and about 9 P and 9 S picks each. They are surrounded by a temporary deploy targeting this swarm, anywhere from 3 to 30 km away. I figure the model is probably OK (this swarm aligns very well with other catalogs) but I suspect there may be a bit of warping going on by assuming a SIMPLE translation across a 10x10 degree area, may the reason the lineaments aren't quite straight.

"coarse" model with much fewer nodes LOCSEARCH OCT 43 34 8 0.0001 12000 5000 verycoarse

"medium" model which is about what I've been using as default LOCSEARCH OCT 86 69 15 0.0001 90000 5000 normal

"fine" LOCSEARCH OCT 86 69 15 0.0001 180000 5000 medium

And "extra fine" which is definitely overkill and essentially the same as medium LOCSEARCH OCT 86 69 15 0.0001 270000 5000 massive

The coarse & medium model is about what I would expect-- the octal tree runs out of iterations and has to just pick a grid/node point eventually. The finer models take a lot longer to run but they start clustering "off grid" which is definitely what I want to see. But aside from the strange lineaments which I guess are some model glitch, they also both still have the same ~5km artificial "diamond" shape as the coarse models which I don't understand-- I would expect a gaussian scatter for both of these, or actually, a SW-NE trend that tends to appear after double differencing.

So, here I am, wondering if there is perhaps a parameter I should adjust to try to fix this. Most of these wash out in the DD process but I think I can do better at this step hopefully.

CONTROL 1 72362
TRANS  SIMPLE  -39.5 109.5 0.0

LOCSIG blah

LOCFILES /data/NLL/swaus NLLOC_OBS /data/NLL/swaus/swaus /data/NLL/OUTPUT 0

LOCHYPOUT SAVE_NLLOC_ALL NLL_FORMAT_VER_2

LOCPHASEID P   P Pg Pb Pn
LOCPHASEID S   S Sg Sb Sn

#       xNum yNum zNum xOrig yOrig zOrig dx dy dz    gridType(MISFIT/PROB_DENSITY) saveFlag (SAVE/NO_SAVE)
LOCGRID 346  276  61  -2.5 -2.5 -2.5  5. 5. 5. MISFIT SAVE

# ??
LOCANGLES ANGLES_NO 5
#LOCANGLES ANGLES_YES 5

#typical error in seconds for traveltime, correlation length that controls covariance between stations
LOCGAU 0.1 0.0

#fraction of traveltime to use as error, minimum traveltime error in seconds, maximum traveltime error in seconds
LOCGAU2 0.05 0.1 3.0

#what geonet uses. i think only 5 are allowed
LOCQUAL2ERR 0.05 0.11 0.225 0.5 99999.9

# LOCSEARCH OCT initNumCells_x initNumCells_y initNumCells_z minNodeSize maxNumNodes numScatter useStationsDensity stopOnMinNodeSize
LOCSEARCH OCT 86 69 15 0.0001 90000 5000

#LOCMETH method maxDistStaGrid minNumberPhases maxNumberPhases minNumberSphases VpVsRatio maxNum3DGridMemory minDistStaGrid iRejectDuplicateArrivals
LOCMETH EDT_OT_WT_ML      600      5              -1              -1               -1.71       -1               0               1

#positive elevation, pvel svel
LOCELEVCORR 1 5.8 3.36

# Calculates a weight for each station that is a function of the average distance between all stations used for location. This helps to correct for irregular station distribution,
# onoffflag cutoffDist
LOCSTAWT 1 150

# LOCPHSTAT RMS_Max NRdgs_Min Gap_Max P_ResidualMax S_ResidualMax Ell_Len3_Max Hypo_Depth_Min Hypo_Depth_Max Hypo_Dist_Max
LOCPHSTAT 3 -1 -1 3 3 70 70 -1 

In particular I thought LOCELEVCORR may be causing a discontinuity, but just checked with it disabled and there is no difference.

I am also not sure about the possible effect of using LOCANGLES, especially for nearby stations. Also now wondering if my minimum errors in LOCGAU/2 are too low. Testing these now but any general advice would again be very appreciated.

alomax commented 2 years ago

Hi, Thanks for all the further details and plots - now I have a much better idea of the scales involved.

Your study are is ~10x10km, but your LOCGRID is LOCGRID 346 276 61 -2.5 -2.5 -2.5 5. 5. 5. MISFIT SAVE which is about 1700x1400x300km, quite huge relative to the study area - this is almost certainly the cause of the gridding: as you note the oct-tree runs out of samples because it is starting with a really vast search. You can and should set the LOCGRID as small as possible with respect to the study area (area containing target seismicity - stations can be outside this area when using 1D models (2D dist-depth travel-time grids) or 3D models/travel-time grids as long as the respective travel-time grids cover the distance (2D) or region (3D) between the LOCGRID and the station.

So I would suggest a LOGGRID like: LOCGRID 41 41 63 -20.0 -20.0 -2.0 1.0 1.0 1.0 PROB_DENSITY SAVE which defines a 40x40x62km box centered on the study area center with the top at 2km above sea level. You must use PROB_DENSITY for EDT, I am not sure what using MISFIT leads to...

To center this box on the study area use something like: TRANS AZIMUTHAL_EQUIDIST WGS-84 <lat center> <lon center> 0 (you have to re-calculate travel-times when you change the TRANS) I highly recommend AZIMUTHAL_EQUIDIST when the stations are generally around and far from a concentrated study area. Otherwise, use another geographic projection (LAMBERT, TRANS_MERC) but not SIMPLE, which is only for very small areas and is not a formal projection - it has relatively large error over larger distances, and you are probably right that it is responsible for the non-straight lineaments. The projections are documented in the new NLL docs: http://alomax.free.fr/nlloc/docs/programs/control.ControlFile.html

Given the above settings, I would first try a search like: LOCSEARCH OCT 20 20 31 0.0001 50000 5000

Minor comments:

#what geonet uses. i think only 5 are allowed
LOCQUAL2ERR 0.05 0.11 0.225 0.5 99999.9

NLL is not limited to 5, but most agencies only use 0-4 or so.

#LOCMETH method maxDistStaGrid minNumberPhases maxNumberPhases minNumberSphases VpVsRatio maxNum3DGridMemory minDistStaGrid iRejectDuplicateArrivals
LOCMETH EDT_OT_WT_ML      600      5              -1              -1               -1.71       -1               0               1

I always use EDT_OT_WT, the ML part is more time consuming (search over time) than useful in general. maxDistStaGrid 600 - this will exclude (zero-weight) readings from stations further then 600km from the LOCGRID center. VpVsRatio -1.71 - a negative value means you have to provide S travel-time grids (or there are no S readings).

LOCELEVCORR is mainly needed when you have stations above zero depth but the model / travel-times are only calculated to zero depth, e.g. with earlier versions of TauPToolkit. If your station elevations are contained in the 1D or 3D model, do not use LOCELEVCORR

I generally do not use LOCSTAWT, except for large regional or teleseismic studies with very asymmetric station distribution and clusters of stations.

LOCANGLES gives output of ray take-off azimuth and dip at the hypocenter. Useful for later running a first-motion fault-plane calculation.

Your LOCGAU settings look OK, but these depend on the scale of the station-event distances, (unknown) true earth heterogeneity. They should capture the typical mean of non-outlier residuals remaining after location, as these residuals crudely reflect model error.

I think this will help some, but there are likely other issues, maybe minor...

Anthony

filefolder commented 2 years ago

Thanks for all this and forgive the delayed reply.

I failed to mention, our actual study area is much larger and this was just a swarm that occurred during our deployment. So, I thought I might be able to get away with using the same model but just increasing the number of nodes.

In the end I created a much smaller grid to target the swarm and it has improved, but there is still some very slight concentration along the phantom grid lines, also looks like many events are more tightly clustered/overlapping as there are fewer dots overall. All in all not a big deal but it is curious. This is still using SIMPLE so I figure that is the reason.

out ps

I was not aware of AZIMUTHAL_EQUIDIST or the new doc page, cheers for that. I generally tend to prefer something like this or LAMBERT but I am currently restricted to SIMPLE until I can get around to coding it into the DD tool we are currently using (e.g. https://github.com/swiss-seismological-service/scrtdd/issues/50) but I am not a c person so it's a slow go.

I still need to build either a LAMBERT or AZI_EQUI model to confirm the cause and will update when I do.

alomax commented 2 years ago

In the end I created a much smaller grid to target the swarm and it has improved

I think this is, in general, the best approach, especially when the target seismicity is concentrated and starting models are 1D. Working with very large models / location search areas is very disk, CPU and memory intensive, while targeting a swarm can be very efficient and fast. When the starting model is 3D and required stations far away there is not much choice but to use large model and travel-time grids. But even in this case if the the study area can me regionalized into smaller units based on station density and distribution of target seismicity, much efficiency can be gained.

but there is still some very slight concentration along the phantom grid lines

OK, yes. If you can extract the NLL control file information from scrtdd, then it may be quite easy to modify the TRANS to LAMBERT or AZIMUTHAL_EQUIDIST and run at the command line. This may also allow faster testing of adjusting other NLL control parameters.

I also notice in your plot that the phantom grid lines plus some of the denser seismicity form a large square - if this square corresponds to a cell of the LOCSEARCH oct-tree inital grid, then this may be an indication of non-convergence (e.g. due to unreasonably large errors for picks or travel-times), or a NLLoc config problem. Does this apparent square move if you change slightly the LOCSEARCH initial grid size?

Anthony

filefolder commented 2 years ago

I think the larger "square" shape, or 90 degree diamond shape has been mostly consistent throughout a large spectrum of LOCGRID and LOCSEARCH values so it seems like it is an artifact of the projection and/or probably the 5km^3 velocity model itself.

My final quick effort comparison between SIMPLE LOMBART AZIMUTHAL_EQUIDIST and TRANS_MERC is below, if it is of any use to anyone in a similar situation.

All figures have the same ~1730 events (+/- 5, I suppose some events don't make the final cut depending on their pick residuals). The center of the figure is (117, -33.35), the size of the figure is .1 x .1 degree, and the center of all models is (117.5, -32.5). The size of the model is roughly 6x6 degrees and the resolution again is 5km cubed. The models were constructed/projected via the excellent https://github.com/claudiodsf/nllgrid helper code and interpolated via scipy (e.g. griddata) and all other NLL conf parameters the same.

The only thing I can say with certainty is the SIMPLE transform has a noticeable amount of glitching, the degree of which is probably not important for most studies but at this scale it is something I would try (am trying!) to avoid. LAMBERT still seems to have a bit of this larger diamond shape but none of the gridding. LAMBERT also seemed to take the longest amount of CPU time but only marginally. AZIMUTHAL_EQUIDIST and TRANS_MERC both seem to eliminate the "diamond" clustering and also seem more or less equivalent at this scale. The big problem however is that the larger Seiscomp ecosystem I am confined to still only supports NLL6 which does not have either of these.

SIMPLE swarm_t_simple

LAMBERT swarm_t_lambert

AZIMUTHAL_EQUIDIST swarm_t_aziequidist

TRANS_MERC swarm_t_tmerc

and some catalog stats although nothing too interesting

SIMPLE
# avg lon error = 3.10 (std 4.86)
# avg lat error = 3.34 (std 5.95)
# avg depth error = 4.89 (std 5.75)
# avg #p = 7.7  avg #s = 8.4
# avg rms = 0.279 (std 0.19)

LAMBERT
# avg lon error = 3.12 (std 5.14) 
# avg lat error = 3.34 (std 5.99)
# avg depth error = 4.90 (std 5.81)
# avg p = 7.7  avg s = 8.4
# avg rms = 0.279 (std 0.20)

AZIMUTHAL_EQUIDIST
# avg lon error = 3.32 (std 5.17)
# avg lat error = 3.38 (std 5.93)
# avg depth error = 4.93 (std 5.77)
# avg #p = 7.7  avg #s = 8.4
# avg rms = 0.286 (std 0.19)

TRANS_MERC
# avg lon error = 3.17 (std 4.90) 
# avg lat error = 3.29 (std 5.83)
# avg depth error = 4.83 (std 5.67)
# avg p = 7.7  avg s = 8.4
# avg rms = 0.279 (std 0.19)

Anyway this thread has been very illuminating for me and I thank you again. Feel free to close at your your discretion.

P.S. I think there may be a small typo in the manual section 1.3.1.3 algorithm 8 for azi_equi http://alomax.free.fr/nlloc/docs/programs/control.ControlFile.html

alomax commented 2 years ago

Hi Thanks so much for these tests and explanations, quite informative. It is nice to see the relative stability between different transforms.

I previously did not appreciate the model resolution:

The size of the model is roughly 6x6 degrees and the resolution again is 5km cubed.

So the size of "diamond" clustering and the horizontal/vertical streaks are on the size of a model cell. If the change in velocity between cells is large, a constant velocity cell of this size will give a travel-time field with strong refractions on the edges of the cell and linear time change across the cell. This is likely to concentrate hypocenters on the cell edges and perhaps give large location error and low location resolution within the cell with respect to a more realistic, smoother, finer velocity model and travel-time field. You could test this by relocating with a mean, 1D model with cell size much finer than 5km, e.g. 0.5 - 0.1 km. Something like:

VGGRID 2 10001 621 0.0 0.0 -2.0 0.1 0.1 0.1 SLOW_LEN FLOAT

where 10001 is chosen so that 10001*0.1 is larger than the diagonal of your 6x6deg study area. When the model cells (1D or 3D) are much smaller, the "diamond" clustering and the horizontal/vertical streak artifacts may not be present, and the relative positions of hypocenters may show more structure...

Anthony

filefolder commented 2 years ago

I think you are right, the linear concentration is probably the velocity model-- below is a constant P/S velocity (but still 3D) version of the SIMPLE transform, same cell size. swarm_t_simp_constvel

So the larger diamond cell is a factor of the model spacing, and the linear concentration of events is evidently due to the quality of the velocity model-- but this can be mitigated by using any non-SIMPLE projection. I think that fact may come as a surprise to people.

(For a homogenous velocity model this actually looks pretty good! BUT the catalog does have about 150 fewer events now, likely a lot of pick residuals went out of tolerance and were rejected. )

alomax commented 2 years ago

Thanks for this test - this is nice as most issues seem to be explained.

For the loss of events and high pick residuals, a quick fix is to apply station static corrections by relocating a second time using the corrections in the output stat_totcorr file for the initial run of NLLoc. Modify the control file with: INCLUDE <initial run output file path/root>.sum.grid0.loc.stat_totcorr and modify LOCFILES <output file path/root> for the corrections run so as not to over-write the initial run.

But note that the selection of arrivals used to calculate the sta/phase statistics (average residuals) is controlled by LOCPHSTAT, which should be set so as to filter out clear, outlier readings/residuals.