Open kerim371 opened 8 months ago
Hey @kerim371
You got an error because the Python version has lower and upper bounds.
Btw, you can try to download .zip
from Assets with an executable if you use Windows.
@DaloroAT thank you! In the future will it be supported on versions > 10.x ?
Yep. I've added an upper bound to avoid potential problems with new Python releases. But most likely there are no problems now.
Good to know. Thank you!
Could you give feedback on the new functionality if you are using the desktop application? And pros, cons, suggestions, wishes.
Could you give feedback on the new functionality if you are using the desktop application? And pros, cons, suggestions, wishes.
The library is a helping hand for tasks like refracted waves tomography. It is very annoying to pick first breaks manually. And it is even worse when the SEGY data is high quality modeled wavefields and you have to pick it manually even when you know that it can be done automatically.
Usually we use first breaks to build upper layer velocity model. To do that we need to have coordinates (x,y,z) of sources, coordinates or receivers and picked traveltime. Thus if we have textual file with such information then we can do the tomography.
Exporting travel times to json
is good idea but it lacks information about source/receiver positions. It would be good if user/developper could set what SEGY headers he wants to see in exported textual file along with picks. For example lets suppose we have 100 traces in SEGY file but for some reason the algorithm could pick only 95 traces and 5 traces left unpicked. I expect in exported textual file 95 XYZ coordinates of source and receiver (or any other trace headers if I need more) along with 95 travel time picks.
Then it is very simple to read json
file and work with structured data like XYZT
.
Also it would be best if SEGY header names would be same as uses any of the famous SEGYIO library.
For example my use case is to pick first breaks, prepare geometry textual file for pyGIMLI tomograpy and run it. By the way there is no need to add an option/function to convert picked json file to pyGIMLI geometry
, you should better focusing on algorithm staff.
There is problem of controlling the accuracy of picking when doing it using python scripts.
I think plotting it using matplotlib so we could easilly print it in Jupyter notebook is much better than saving result as images on disk. The QC is very important and I hope it to be powerful so the developper could get access to the displayed figures/axes/images/lines etc to customize it as he wants.
That could accelerate the process of finding best picking parameters: no need to open jpg
file after each time one modifies picking settings.
I think this library is mostly useful for picking FB using small scripts rather than using application. Application faciliates the beginnig but professionals would prefer to write 10-20 lines and automate the process of setting input/output and settings and accelerate the wark process. So I think there should be an option to install the package without Qt and any related to GUI staff.
And there is one important note: I think this package could be tested on big 3D sesmic data.
I understand that working on CPU is much slower than using it on GPU but many of HPC clusters uses CPU as it less expensive. It would be interesting work of performing such test on the cloud for example using one the parallelization technique. I don't know how good python parallelization among cluster nodes is but the parallelization could be done using Julia Distributed: each shot may be processed on separate node. Julia has great compatibility with Python (PyCall). As I understand first_breaks_picking
uses some library written in C
or C++
for performing neural network under the hood and Python
is just an interface, so it doesn't slow down it much. Or the parallelization may be done using MPI I think.
If such test is successful and the quality/perfomance is good enough then it could be used with industrial scale problems.
One minor note: please consider adding \n
symbol to json
after each key-value pair because now it editor it looks like single line data:
Exporting travel times to json is good idea but it lacks information about source/receiver positions. It would be good if user/developper could set what SEGY headers he wants to see in exported textual file along with picks. For example lets suppose we have 100 traces in SEGY file but for some reason the algorithm could pick only 95 traces and 5 traces left unpicked. I expect in exported textual file 95 XYZ coordinates of source and receiver (or any other trace headers if I need more) along with 95 travel time picks.
Got it. Right now I placed basic information because the package doesn't perform any sorting, so matching between picks and traces might be performed outside. But yeah, it would be convenient to have all the necessary information during saving. At least duplicate some headers.
Then it is very simple to read json file and work with structured data like XYZT. Also it would be best if SEGY header names would be same as uses any of the famous SEGYIO library.
I think I can add the possibility of choosing between header names based on the popular applications and side packages. Right now I follow names from Radex. But I think it is less priority.
For example my use case is to pick first breaks, prepare geometry textual file for pyGIMLI tomograpy and run it. By the way there is no need to add an option/function to convert picked json file to pyGIMLI geometry, you should better focusing on algorithm staff.
You are not the first who asked for exporting in PyGIMLI format. I also used this package several years ago. I can add this way of exporting.
There is problem of controlling the accuracy of picking when doing it using python scripts. I think plotting it using matplotlib so we could easilly print it in Jupyter notebook is much better than saving result as images on disk. The QC is very important and I hope it to be powerful so the developper could get access to the displayed figures/axes/images/lines etc to customize it as he wants. That could accelerate the process of finding best picking parameters: no need to open jpg file after each time one modifies picking settings.
I also like matplotlib
, it's kinda "standard" for visualization in Python. But the problem with matplotlib
is that it is extremely slow to plot seismic data with white/black wiggles. Plotting just 100 traces took 5 seconds while opening the no-window Qt
app, plotting 10k traces and saving it into file took 1-2 seconds. Maybe you know a good performing solution with matplotlib
? But regarding your wish to render the graph in a Jupyter cell, I can add this opportunity even with PyQt5
.
I provide some level of customization, but I agree it's limited in comparison with options for axes and figure in matplotlib
. I think about a compromised solution that can combine both advantages of matplotlib
and PyQt
- render content as an image in PyQt
and return it, then modify it however you want with matplotlib
(or other packages for visualization).
I think this library is mostly useful for picking FB using small scripts rather than using application. Application faciliates the beginnig but professionals would prefer to write 10-20 lines and automate the process of setting input/output and settings and accelerate the wark process. So I think there should be an option to install the package without Qt and any related to GUI staff.
PyQt5
with its dependencies is just 20% bigger than matplotlib
with its dependencies, so it's not a big overhead. In my opinion lightweight version shouldn't have any way for visualization, neither matplotlib
nor Qt
. Good point, maybe I will do this in the future.And there is one important note: I think this package could be tested on big 3D sesmic data. I understand that working on CPU is much slower than using it on GPU but many of HPC clusters uses CPU as it less expensive. It would be interesting work of performing such test on the cloud for example using one the parallelization technique. I don't know how good python parallelization among cluster nodes is but the parallelization could be done using Julia Distributed: each shot may be processed on separate node. Julia has great compatibility with Python (PyCall).
As I understand first_breaks_picking uses some library written in C or C++ for performing neural network under the hood and Python is just an interface, so it doesn't slow down it much. Or the parallelization may be done using MPI I think.
I use onnx
framework. It allows to do inference with different programming languages. You can try the model on other languages if you want, but it's C++ under the hood, so I think no performance gains are expected.
One minor note: please consider adding \n symbol to json after each key-value pair because now it editor it looks like single line data
json
file is just a flattened string, any \n
symbols shouldn't be rendered as a new line in editors. At least if IDE has an interpreter for json
content.
- Maybe you know a good performing solution with
matplotlib
?
No, unfortunaly if we discuss fast rendering solution python then probably Qt/Qwt is the fastest as they are originally written in C++ and I used to work with Qwt to plot wiggle traces filled with color and agree that plotting 10k traces takes only few seconds (I also used to test it few years ago). Also there was an option in Qwt (version 6.2 if I'm not mistaken) "Turn On OpenGL" that greately accelerates the perfomance.
- I want to prepare later a script to compare performance. Could you run it on your machine?
Sure I can try.
Hi,
Are there any reasons why it doesn't work with python > 3.10? For example when trying to run
pip install first-breaks-picking-gpu
I get error: