A new environment variable called MDIO__IMPORT__CLOUD_NATIVE trades off available bandwidth against random read latency. Added to documentation as well.
It is only helpful in a high-speed throughput environment, such as data and ingestion machine(s) in the cloud.
Values that will enable it are {"True", "1", "true"}. For instance:
$ export MDIO__IMPORT__CLOUD_NATIVE="true"
Details
When we scan the headers of a remote SEG-Y file, the ideal case is to read ONLY headers for each trace to minimize bandwidth requirements. However, this causes millions of requests, a performance bottleneck even with multiprocessing or threading. If the client has a very slow internet connection, this will still be okay. When reading local files from SSD, this is fine; mechanical drives may still be problematic and benefit from the flag.
This MDIO__IMPORT__CLOUD_NATIVE flag enables buffered reading of the file regardless of where the ingestion occurs. If the file is on the cloud, and ingestion machine(s) are on the cloud with high-throughput between machine(s) and object store, this flag works very well. The only disadvantage is it reads the file twice (just like any other buffered read). However, this tradeoff significantly increases the ingestion performance on a cloud-native environment and at a lower cost (fewer requests to the object).
Summary
A new environment variable called
MDIO__IMPORT__CLOUD_NATIVE
trades off available bandwidth against random read latency. Added to documentation as well.It is only helpful in a high-speed throughput environment, such as data and ingestion machine(s) in the cloud.
Values that will enable it are
{"True", "1", "true"}
. For instance:Details
When we scan the headers of a remote SEG-Y file, the ideal case is to read ONLY headers for each trace to minimize bandwidth requirements. However, this causes millions of requests, a performance bottleneck even with multiprocessing or threading. If the client has a very slow internet connection, this will still be okay. When reading local files from SSD, this is fine; mechanical drives may still be problematic and benefit from the flag.
This
MDIO__IMPORT__CLOUD_NATIVE
flag enables buffered reading of the file regardless of where the ingestion occurs. If the file is on the cloud, and ingestion machine(s) are on the cloud with high-throughput between machine(s) and object store, this flag works very well. The only disadvantage is it reads the file twice (just like any other buffered read). However, this tradeoff significantly increases the ingestion performance on a cloud-native environment and at a lower cost (fewer requests to the object).