JintaoLee-Roger / cigsegy

A tool for exchanging data between SEG-Y format and NumPy array inside Python environment
MIT License
38 stars 4 forks source link

Read a large segy file #9

Open FJGEODEV opened 3 months ago

FJGEODEV commented 3 months ago

Hi, thanks for your work, really appreciated!

One question: when I'm reading a large segy file, using this line:

from cigsegy import SegyNP
d = SegyNP('3Dpoststack.sgy')
d.shape # (ni, nx, nt), use as a numpy array, 3D geometry

In this case, is "d" already in RAM or not yet? since you can calculate d.min()and d.max(), I assume d is already read into RAM.

However, in the main page, you mentioned "Access the SEG-Y file as a 3D numpy array, without reading the whole file into memory", kind of confused to me.

I'm looking for a solution that could read unstructured data but not into memory yet, I will pick specific inline_number and crossline_number to read into RAM.

Thanks.

JintaoLee-Roger commented 3 months ago

Hi, I just scan the SEG-Y file, and obtain some geometric information. So, in your case, d is not in RAM.

Why can use d.min() and d.max()?

See this: segynp

sx.min(), sx.max()
# get the min and max value, but they are evaluated from a part of data,
# so they may not be the real min and max value

The minimum and maximum values are not the real values. When d is instantiated, I read 12000 traces (4000 in the front, 4000 in the middle, and 4000 in the last) to calculate min() and max(). You can see here: https://github.com/JintaoLee-Roger/cigsegy/blob/1e0a1c5c6a5ecb8df8880800bfdf2df28c1562f0/python/segynp.py#L80

JintaoLee-Roger commented 3 months ago

Since we're not reading the data into memory, reading the time slices will be slower.

>>> d[30] # fast 
>>> d[:, 30, :]
>>> d[:, :, 30] # slow
FJGEODEV commented 3 months ago

Thanks for your reply. You are right, I missed the eval_range part.