DaloroAT / first_breaks_picking

First break picking in seismic gather
Apache License 2.0
109 stars 39 forks source link

The result of first-arrival-pick containing information of shot and geophone positions? #33

Closed lj-cug closed 11 months ago

lj-cug commented 1 year ago

It's a good tool to pick the first arrival time of seismic data. Thank you for your contribution. The input file for PyGIMLI need the first-arrival-time file, which contains three columns data: shot position, geophone position, travel-time. How can I obtain this formatted file from first_breaks_picking tools? If I can get this file, the workflow for tomography and full-waveform inversion can continue. Yours Sincerely, Li Jian China University of Geosciences

DaloroAT commented 1 year ago

Hey @lj-cug

Looks like you are using Python scripts to run PyGIMLI, so in this case I'll recommend you check the section on how to use the lib with Python here. Please, click on Detailed examples section to show snippets.

Regarding your question - there is no simple method or way to do it. But I can suggest the following snippet to generate such a file:

from pathlib import Path
from typing import Optional, Union, Sequence, Tuple

import numpy as np
import pandas as pd

from first_breaks.sgy.reader import SGY

def prepare_pygimli_file(sgy: SGY,
                         picks: Union[np.ndarray, Sequence[float]],
                         filename: Union[Path, str],
                         add_file_header: bool = True,
                         precision: int = 8,
                         s_x: Optional[Union[str, np.ndarray, Sequence[float]]] = "SOU_X",
                         s_y: Optional[Union[str, np.ndarray, Sequence[float]]] = "SOU_Y",
                         s_z: Optional[Union[str, np.ndarray, Sequence[float]]] = "SOU_ELEV",
                         g_x: Optional[Union[str, np.ndarray, Sequence[float]]] = "REC_X",
                         g_y: Optional[Union[str, np.ndarray, Sequence[float]]] = "REC_Y",
                         g_z: Optional[Union[str, np.ndarray, Sequence[float]]] = "REC_ELEV",
                         columns_order: Tuple[str, ...] = ("s_x", "s_y", "g_x", "g_y", "t")):
    # we assume that amount of traces in file and amount of the picks is the same
    assert sgy.num_traces == len(picks)

    data_to_dump = {}
    for file_column, value in [
        ["s_x", s_x],
        ["s_y", s_y],
        ["s_z", s_z],
        ["g_x", g_x],
        ["g_y", g_y],
        ["g_z", g_z],
        ["t", picks]
    ]:
        if s_x is None:
            continue
        elif isinstance(value, str):  # read value from file headers
            data_to_dump[file_column] = sgy.traces_headers[value].tolist()
        elif isinstance(value, (list, tuple, np.ndarray)):  # you can provide custom iterable instead of header values
            assert len(value) == sgy.num_traces
            if isinstance(value, np.ndarray):
                value = value.tolist()
            data_to_dump[file_column] = value
        else:
            raise TypeError(f"Unsupported type {type(value)} for {file_column} column")

    assert set(columns_order).intersection(set(data_to_dump.keys())) == set(columns_order)

    Path(filename).parent.mkdir(parents=True, exist_ok=True)
    df = pd.DataFrame({k: data_to_dump[k] for k in columns_order})
    df.to_csv(filename, sep='\t', index=False, float_format=f'%.{precision}f', header=add_file_header)

Then you can do the following alternative scenarios:

1) Get picks with python + save pygimli file

from first_breaks.picking.picker_onnx import PickerONNX
from first_breaks.picking.task import Task
from first_breaks.sgy.reader import SGY

sgy_filename = 'data.sgy'
sgy_to_pick = SGY(sgy_filename)

task = Task(sgy_to_pick,
            traces_per_gather=24,
            maximum_time=200)
picker = PickerONNX(show_progressbar=True)
task = picker.process_task(task) 
picks_in_mcs = task.picks_in_mcs
prepare_pygimli_file(sgy_to_pick, picks=picks_in_mcs, filename="pygimli_dump.txt", precision=10)

2) Get picks with python + save new SGY updated with first breaks + save pygimli file

from first_breaks.picking.picker_onnx import PickerONNX
from first_breaks.picking.task import Task
from first_breaks.sgy.reader import SGY

sgy_filename = 'data.sgy'
sgy_to_pick = SGY(sgy_filename)

task = Task(sgy_to_pick,
            traces_per_gather=24,
            maximum_time=200)
picker = PickerONNX(show_progressbar=True)
task = picker.process_task(task)

# save sgy updated with picks in header
sgy_filename_picks = "data_picked.sgy"
encoding = "i"
byte_position = 236
picks_unit = "mcs"
task.export_result_as_sgy(filename=sgy_filename_picks, encoding=encoding, picks_unit=picks_unit, byte_position=byte_position)

# read updated sgy and get values from header
sgy_with_picks = SGY(sgy_filename_picks)
picks_in_mcs = sgy_with_picks.read_custom_trace_header(byte_position=byte_position, encoding=encoding)
prepare_pygimli_file(sgy_with_picks, picks=picks_in_mcs, filename="pygimli_dump.txt", precision=10)

3) Get picks with Desktop app and save it to file + save pygimli file

# we saved this file with Desktop app. Put encoding and byte_position value, which you selected during saving
sgy_filename_picks = FILE_DESKTOP_SAVED
encoding = ENCODING_DESKTOP_SAVED
byte_position = BYTE_POSITION_DESKTOP_SAVED

# read updated sgy and get values from header
sgy_with_picks = SGY(sgy_filename_picks)
picks_in_mcs = sgy_with_picks.read_custom_trace_header(byte_position=byte_position, encoding=encoding)
prepare_pygimli_file(sgy_with_picks, picks=picks_in_mcs, filename="pygimli_dump.txt", precision=10)

Please let me know if it helps or not!

DaloroAT commented 1 year ago

Also, adjust the units for position and travel time according to your needs.

lj-cug commented 1 year ago

Thank you very much for your codes. I have tried to run and obtained the pygimli_dump.txt file. Because my segy file doesn't contain shot and geophone position information, I have to supplement these coordinates into the PyGIMLI input file. Today I study the **Unified Data Format" file of PyGIMLI following https://github.com/gimli-org/example-data/issues/2 The contents like this: It first describes the sensor (both shot and geophone for traveltime) positions and then the data. Assume a single shot with 4 receivers, each spaced 5m with an offset of 10m: 5 # 1 shot and 4 receivers // x z -10 0 # shot position 0 0 # the following 4 lines are geophone position 5 0 10 0 15 0 // s g t 1 2 0.2 # shot index, geophone index, travel-time (s) 1 3 0.3 1 4 0.5 1 5 0.6 I am thinking how to output these data based on your supplied Python scripts. If you could help me, it's better.

lj-cug commented 1 year ago

shot_position.txt geophone_position.txt

lj-cug commented 1 year ago

data.zip This is the segy seismic data file.

lj-cug commented 1 year ago

image This is the shot and geophone position and final inversion result figure.

DaloroAT commented 1 year ago

5 # 1 shot and 4 receivers // x z -10 0 # shot position 0 0 # the following 4 lines are geophone position 5 0 10 0 15 0 // s g t 1 2 0.2 # shot index, geophone index, travel-time (s) 1 3 0.3 1 4 0.5 1 5 0.6 I am thinking how to output these data based on your supplied Python scripts. If you could help me, it's better.

What should the file look like for multiple shots?

DaloroAT commented 1 year ago

Oh, I see. x y describes both sources and receivers. It just identifies some spatial points. s g reuses indices of spatial points. So we can describe exploration with several shots like:

x y
0 11
0 12
0 13
0 14
0 15
s g t
1 5 5
2 5 4
3 5 3
4 5 2
5 5 0
DaloroAT commented 1 year ago

Thank you for data sharing. But what is the relation between files with positions and traces in the file? Because they have different amounts.

DaloroAT commented 1 year ago

@lj-cug is it relevant?

lj-cug commented 1 year ago

These days I checked the data in the shot_position and geophone_position data files. I found the data format, which shows the shot and the geophone positions are relevant. Actually 78 shots were conducted and the data in geophone_position file are the first geophone in vertical dimension, the interval of geophone points is 1 meter (total 12 geophones for every shot). So I can prepare the pyGIMLI input files now. But the shot and geophone points layout in different geophysical survey are different, so understanding the layout is important. Thank you for your help. The automatic first arrival picking tool can save me much time! But in actual picking job, it's better to adjust the picking point. Li Jian

DaloroAT commented 1 year ago

I'm glad you were able to figure out how to prepare the file!

But in actual picking job, it's better to adjust the picking point.

Yeah, that's true. But sometimes it's enough to adjust picking parameters.

Could you share your picking parameters? I found such parameters relatively good

image image

lj-cug commented 12 months ago

Actually, I use the default parameters used in the example code. My picking shows as following: image

DaloroAT commented 12 months ago

Yep, I tried the default parameters, but it looks random 😁 common observation - you should choose parameters at which it would be convenient for you to pick manually. In this case, the neural network should give a good result, because it was traines in such setup