GeoNet / help

An issues repo for technical help questions.
6 stars 3 forks source link

how to use GeoNet seismic waveform data from a local storage #121

Open elidana opened 5 months ago

elidana commented 5 months ago

We have received this user request from @segburg (https://github.com/GeoNet/fdsn/issues/244) , who is using an on premise data store to analyse seismic waveform data from GeoNet stations.

The local store has some corrupted file, but when downloading data directly from the GeoNet FDSN service or open-data archive those files are complete.

I am using this ticket to provide some possible options (potentially interesting for other data users) to have an on-premise copy or sync of the GeoNet seismic waveform data for a given time period.

option 1 (small volumes of data) - GeoNet FDSN webservice

The GeoNet FDSN service can be used to request small data volumes or when data from the past 7 days are required. Instructions on how to access this service are provided on the GeoNet website FDSN page and some tutorials are available in the dataselect jupyter data tutorial

option 2 (moderate to large volumes of data) - GeoNet open data bucket

For moderate to large volumes of data, the recommended approach is to copy data from the GeoNet AWS Open data bucket (https://www.geonet.org.nz/data/access/aws).

Details on how waveform miniseed files are organized in the GeoNet Open AWS archive are provided in the GeoNet data tutorials and some initial instructions and introduction to how to interact with it are provided in this GeoNet data blog

aws-cli

To interact with the GeoNet open data bucket, the aws command line interface (_awscli) can be used. For that, the aws-cli utility shall be installed on a Unix/Linux machine (https://aws.amazon.com/cli/). Users should refer to the AWS cli documentation for a full set of instructions and options.

Once the asws-cli is installed, to list the content for a specific year or day and station, the following command can be run from a terminal:

aws s3 ls --no-sign-request s3://geonet-open-data/waveforms/miniseed/2023

or

aws s3 ls --no-sign-request s3://geonet-open-data/waveforms/miniseed/2023/2023.031/WTAZ.NZ/

To copy one file on your local machine (once the /home/username/tmp folder has been created, the command is

aws s3 cp --no-sign-request s3://geonet-open-data/waveforms/miniseed/2023/2023.031/WTAZ.NZ/2023.031.WTAZ.12-HHE.NZ.D /home/username/tmp/.

To sync an entire day worth of data to your local machine (on the same /home/username/tmp folder)

aws s3 sync --no-sign-request s3://geonet-open-data/waveforms/miniseed/2023/2023.031/WTAZ.NZ/ /home/username/tmp/.

and the same can be applied for all stations available for that day with the following command

aws s3 sync --no-sign-request s3://geonet-open-data/waveforms/miniseed/2023/2023.031/ /home/username/tmp/.

The sync command will generate a local folder structure that is similar to what is in the GeoNet Open data bucket. If the user requires a different structure for the file, some symbolic links can be created locally to match the preferred local seismic waveform naming convention.

s3fs

s3fs is a utility that can run on Linux and MacOS operating systems and can be used to "mount" an S3 bucket locally and mimic some of the functionalities of a local mount.

Detailed instructions are available here: https://github.com/s3fs-fuse/s3fs-fuse

Below some very quick instructions on how to use it on a Linux based system (Fedora based), that will need to be adapted to the local operating system.

install s3fs fuse (might require sudo access)

dnf install s3fs-fuse

create your local destination directory and "mount" the geonet open data bucket

mkdir /home/username/tmp/
s3fs geonet-open-data:/waveforms/miniseed/ /home/username/tmp/ -o public_bucket=1

We can provide more detailed instructions on how to do these steps, or filter specific waveform data, or different options on how to interact with the AWS open data bucket.

calum-chamberlain commented 5 months ago

For any obspy users, I have hacked together a drop-in replacement for obspy.clients.fdsn.Client for use with the GeoNet open-data bucket that anyone is welcome to use (and adapt and fix as needed). This is here - make sure you test it yourself before trusting it!

salichon commented 4 months ago

As a short @elidana @segburg

Open Data bucket to Seiscomp data structure (SDS) emulation (cf. : https://www.seiscomp.de/doc/base/glossary.html#term-SDS)