DaloroAT / first_breaks_picking

First break picking in seismic gather
Apache License 2.0
106 stars 38 forks source link

Python > 3.10 #35

Open kerim371 opened 8 months ago

kerim371 commented 8 months ago

Hi,

Are there any reasons why it doesn't work with python > 3.10? For example when trying to run pip install first-breaks-picking-gpu I get error:

ERROR: Ignored the following versions that require a different python version: 0.3.100 Requires-Python <=3.10,>=3.8; 0.4.0a0 Requires-Python <=3.10,>=3.8; 0.4.0a2 Requires-Python <=3.10,>=3.8; 0.5.1 Requires-Python <=3.10,>=3.8; 0.6.0 Requires-Python <=3.10,>=3.8
ERROR: Could not find a version that satisfies the requirement first-breaks-picking-gpu (from versions: none)
ERROR: No matching distribution found for first-breaks-picking-gpu

[notice] A new release of pip available: 22.2.2 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip
DaloroAT commented 8 months ago

Hey @kerim371

You got an error because the Python version has lower and upper bounds.

Btw, you can try to download .zip from Assets with an executable if you use Windows.

kerim371 commented 8 months ago

@DaloroAT thank you! In the future will it be supported on versions > 10.x ?

DaloroAT commented 8 months ago

Yep. I've added an upper bound to avoid potential problems with new Python releases. But most likely there are no problems now.

kerim371 commented 8 months ago

Good to know. Thank you!

DaloroAT commented 8 months ago

Could you give feedback on the new functionality if you are using the desktop application? And pros, cons, suggestions, wishes.

kerim371 commented 8 months ago

Could you give feedback on the new functionality if you are using the desktop application? And pros, cons, suggestions, wishes.

The library is a helping hand for tasks like refracted waves tomography. It is very annoying to pick first breaks manually. And it is even worse when the SEGY data is high quality modeled wavefields and you have to pick it manually even when you know that it can be done automatically.

Usually we use first breaks to build upper layer velocity model. To do that we need to have coordinates (x,y,z) of sources, coordinates or receivers and picked traveltime. Thus if we have textual file with such information then we can do the tomography.

Exporting travel times to json is good idea but it lacks information about source/receiver positions. It would be good if user/developper could set what SEGY headers he wants to see in exported textual file along with picks. For example lets suppose we have 100 traces in SEGY file but for some reason the algorithm could pick only 95 traces and 5 traces left unpicked. I expect in exported textual file 95 XYZ coordinates of source and receiver (or any other trace headers if I need more) along with 95 travel time picks.

Then it is very simple to read json file and work with structured data like XYZT. Also it would be best if SEGY header names would be same as uses any of the famous SEGYIO library.

For example my use case is to pick first breaks, prepare geometry textual file for pyGIMLI tomograpy and run it. By the way there is no need to add an option/function to convert picked json file to pyGIMLI geometry, you should better focusing on algorithm staff.

There is problem of controlling the accuracy of picking when doing it using python scripts. I think plotting it using matplotlib so we could easilly print it in Jupyter notebook is much better than saving result as images on disk. The QC is very important and I hope it to be powerful so the developper could get access to the displayed figures/axes/images/lines etc to customize it as he wants. That could accelerate the process of finding best picking parameters: no need to open jpg file after each time one modifies picking settings.

I think this library is mostly useful for picking FB using small scripts rather than using application. Application faciliates the beginnig but professionals would prefer to write 10-20 lines and automate the process of setting input/output and settings and accelerate the wark process. So I think there should be an option to install the package without Qt and any related to GUI staff.

And there is one important note: I think this package could be tested on big 3D sesmic data. I understand that working on CPU is much slower than using it on GPU but many of HPC clusters uses CPU as it less expensive. It would be interesting work of performing such test on the cloud for example using one the parallelization technique. I don't know how good python parallelization among cluster nodes is but the parallelization could be done using Julia Distributed: each shot may be processed on separate node. Julia has great compatibility with Python (PyCall). As I understand first_breaks_picking uses some library written in C or C++ for performing neural network under the hood and Python is just an interface, so it doesn't slow down it much. Or the parallelization may be done using MPI I think.

If such test is successful and the quality/perfomance is good enough then it could be used with industrial scale problems.

One minor note: please consider adding \n symbol to json after each key-value pair because now it editor it looks like single line data: image

DaloroAT commented 7 months ago

Exporting travel times to json is good idea but it lacks information about source/receiver positions. It would be good if user/developper could set what SEGY headers he wants to see in exported textual file along with picks. For example lets suppose we have 100 traces in SEGY file but for some reason the algorithm could pick only 95 traces and 5 traces left unpicked. I expect in exported textual file 95 XYZ coordinates of source and receiver (or any other trace headers if I need more) along with 95 travel time picks.

Got it. Right now I placed basic information because the package doesn't perform any sorting, so matching between picks and traces might be performed outside. But yeah, it would be convenient to have all the necessary information during saving. At least duplicate some headers.

Then it is very simple to read json file and work with structured data like XYZT. Also it would be best if SEGY header names would be same as uses any of the famous SEGYIO library.

I think I can add the possibility of choosing between header names based on the popular applications and side packages. Right now I follow names from Radex. But I think it is less priority.

For example my use case is to pick first breaks, prepare geometry textual file for pyGIMLI tomograpy and run it. By the way there is no need to add an option/function to convert picked json file to pyGIMLI geometry, you should better focusing on algorithm staff.

You are not the first who asked for exporting in PyGIMLI format. I also used this package several years ago. I can add this way of exporting.

There is problem of controlling the accuracy of picking when doing it using python scripts. I think plotting it using matplotlib so we could easilly print it in Jupyter notebook is much better than saving result as images on disk. The QC is very important and I hope it to be powerful so the developper could get access to the displayed figures/axes/images/lines etc to customize it as he wants. That could accelerate the process of finding best picking parameters: no need to open jpg file after each time one modifies picking settings.

I think this library is mostly useful for picking FB using small scripts rather than using application. Application faciliates the beginnig but professionals would prefer to write 10-20 lines and automate the process of setting input/output and settings and accelerate the wark process. So I think there should be an option to install the package without Qt and any related to GUI staff.

And there is one important note: I think this package could be tested on big 3D sesmic data. I understand that working on CPU is much slower than using it on GPU but many of HPC clusters uses CPU as it less expensive. It would be interesting work of performing such test on the cloud for example using one the parallelization technique. I don't know how good python parallelization among cluster nodes is but the parallelization could be done using Julia Distributed: each shot may be processed on separate node. Julia has great compatibility with Python (PyCall).

As I understand first_breaks_picking uses some library written in C or C++ for performing neural network under the hood and Python is just an interface, so it doesn't slow down it much. Or the parallelization may be done using MPI I think.

I use onnx framework. It allows to do inference with different programming languages. You can try the model on other languages if you want, but it's C++ under the hood, so I think no performance gains are expected.

One minor note: please consider adding \n symbol to json after each key-value pair because now it editor it looks like single line data

json file is just a flattened string, any \n symbols shouldn't be rendered as a new line in editors. At least if IDE has an interpreter for json content.

kerim371 commented 7 months ago
  • Maybe you know a good performing solution with matplotlib?

No, unfortunaly if we discuss fast rendering solution python then probably Qt/Qwt is the fastest as they are originally written in C++ and I used to work with Qwt to plot wiggle traces filled with color and agree that plotting 10k traces takes only few seconds (I also used to test it few years ago). Also there was an option in Qwt (version 6.2 if I'm not mistaken) "Turn On OpenGL" that greately accelerates the perfomance.

  • I want to prepare later a script to compare performance. Could you run it on your machine?

Sure I can try.