avzero07 / eew-nn-project

Repo to Keep Track of Project Utilities
GNU Affero General Public License v3.0
0 stars 1 forks source link

Waveform Data Retrieval #13

Open avzero07 opened 3 years ago

avzero07 commented 3 years ago

The email based retrieval is not possible. This was always inefficient anyway. The public facing FTP server ftp.seismo.NRCan.gc.ca contains waveform data in miniseed format, updated daily. The archive of data is available from 1975.

ftp> pwd
257 "/wfdata/CN" is the current directory
ftp> ls
227 Entering Passive Mode (132,156,41,1,155,181).
150 Opening ASCII mode data connection for file list
drwxr-xr-x 2 ftp ftp 66 Mar 9 06:00 1975
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1976
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1977
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1978
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1979
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1980
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1981
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1982
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1983
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1984
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1985
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1986
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1987
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1988
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1989
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1990
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1991
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1992
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1993
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1994
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1995
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1996
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1997
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1998
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 1999
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 2000
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 2001
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 2002
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 2003
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 2004
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 2005
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 2006
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 2007
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 2008
..
..
drwxr-xr-x 2 ftp ftp 126 Mar 9 06:00 2020
drwxr-xr-x 2 ftp ftp 36 Mar 9 06:00 2021
226 Transfer complete
avzero07 commented 3 years ago

File retriever works.

akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$` time python3 file_retriever.py dummy.test.2.txt dt_root
Current Local Dir : /mnt/c/Users/aksha/Desktop/eew-nn-project/util/dt_root/2020/11/28
Downloaded 20201128.CN.WSLR..HNN.mseed to .
Downloaded 20201128.CN.HOPB..HHZ.mseed to .
Downloaded 20201128.CN.WSLR..HHE.mseed to .
Downloaded 20201128.CN.HOPB..HHE.mseed to .
Downloaded 20201128.CN.HOPB..HNZ.mseed to .
Downloaded 20201128.CN.HOPB..HNE.mseed to .
Downloaded 20201128.CN.HOPB..HNN.mseed to .
Downloaded 20201128.CN.HOPB..HHN.mseed to .
Downloaded 20201128.CN.WSLR..HNZ.mseed to .
Downloaded 20201128.CN.WSLR..HHN.mseed to .
Downloaded 20201128.CN.WSLR..HNE.mseed to .
Downloaded 20201128.CN.WSLR..HHZ.mseed to .
Data from 2020-11-28 already present! Skipping

real    0m31.046s
user    0m2.374s
sys     0m1.963s

Currently single threaded. Need to run parallely. Perhaps one thread per station or one per event (won't be efficient unless the date set can be shared)?

avzero07 commented 3 years ago

FTP connections seem to be throttled during the day. Need to test again at night.

akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dummy.test.2.txt dt_root
Current Local Dir : /mnt/c/Users/aksha/Desktop/eew-nn-project/util/dt_root/2020/11/28
Downloaded 20201128.CN.NCSB..HHE.mseed to .
Downloaded 20201128.CN.NCSB..HHZ.mseed to .
Downloaded 20201128.CN.NCSB..HHN.mseed to .
Downloaded 20201128.CN.JEDB..HHE.mseed to .
Downloaded 20201128.CN.JEDB..HNZ.mseed to .
Downloaded 20201128.CN.JEDB..HNN.mseed to .
Downloaded 20201128.CN.JEDB..HHZ.mseed to .
Downloaded 20201128.CN.JEDB..HNE.mseed to .
Downloaded 20201128.CN.SNB..HHE.mseed to .
Downloaded 20201128.CN.HSNB..HHN.mseed to .
Downloaded 20201128.CN.HSNB..HHZ.mseed to .
Downloaded 20201128.CN.HSNB..HNN.mseed to .
Downloaded 20201128.CN.JEDB..HHN.mseed to .
....
....
Downloaded 20201128.CN.PACB..HHN.mseed to .
Downloaded 20201128.CN.PACB..HNE.mseed to .
Downloaded 20201128.CN.PACB..HNZ.mseed to .
Downloaded 20201128.CN.PACB..HNN.mseed to .
Downloaded 20201128.CN.PACB..HHZ.mseed to .
Data from 2020-11-28 already present! Skipping

real    21m16.047s
user    0m14.662s
sys     0m19.835s
avzero07 commented 3 years ago

Parallel execution with ProcessPoolExecutor works well!

akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py .
Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HHN.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HHZ.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HNE.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HNN.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HNZ.mseed

real    0m11.859s
user    0m0.774s
sys     0m1.908s
avzero07 commented 3 years ago

Larger test with ProcessPoolExecutor. Attempt to download all data for target stations from 2020/11/28

akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/11/28/
Downloaded /wfdata5/CN/2020/11/28/20201128.CN.BFSB..HHE.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.BFSB..HHN.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.BFSB..HHZ.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.BFSB..HNE.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.BFSB..HNN.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.BFSB..HNZ.mseed

....
....
....

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WPB..HHN.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WPB..HHZ.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WPB..HNE.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WPB..HNN.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WPB..HNZ.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HHE.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HHN.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HHZ.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HNE.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HNN.mseed

Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HNZ.mseed

real    1m31.664s
user    0m15.555s
sys     0m36.213s

115 files in 1m31.664s!!!

avzero07 commented 3 years ago

Takes about 16 minutes to create the folder tree and prep manifest files. This is one-time, so I guess it's fine. Log file looks good too.

akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever_prep.py Search-Param-EQCanada.txt dt_root

real    15m41.108s
user    0m4.136s
sys     0m1.920s
avzero07 commented 3 years ago

Batch scheduling was a bust. Kept running into deadlocks. Sticking to semi-auto for now.

akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ ls dt_root/2020/01/
04  06  07  08  18  19  20  22  24  25  27  28
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/07

real    6m1.882s
user    0m21.770s
sys     0m46.059s
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/08

real    9m10.658s
user    0m28.806s
sys     1m3.600s
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/18

real    5m40.986s
user    0m18.657s
sys     0m42.861s
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/19

real    8m32.061s
user    0m19.044s
sys     0m42.889s
avzero07 commented 3 years ago

Sometimes, one transfer will stall, causing the rest to hold up. Best thing to do is manually intervene and kill the process associated to the stalled download.

Usually restarting the stalled file separately works.

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 10:11 ?        00:00:00 /init
...
...
...
akshay    4959    43  0 18:51 pts/1    00:00:00 tail -f dt_root/2020/01/20/file_retriever.log
akshay    4988  4944  0 18:52 pts/0    00:00:01 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.HOPB..HHE.mseed
akshay    5025  4937  1 18:55 pts/0    00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.NTKA..HHN.mseed
akshay    5026  4938  1 18:55 pts/0    00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.NTKA..HHZ.mseed
akshay    5030  4934  1 18:55 pts/0    00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PACB..HHE.mseed
akshay    5031  4940  1 18:55 pts/0    00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PACB..HHN.mseed
akshay    5032  4942  1 18:55 pts/0    00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PACB..HHZ.mseed
akshay    5033  4936  1 18:55 pts/0    00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PACB..HNE.mseed
akshay    5034  4935  1 18:55 pts/0    00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PACB..HNN.mseed
akshay    5035  4939  1 18:55 pts/0    00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PACB..HNZ.mseed
akshay    5036  4933  0 18:55 pts/0    00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PGC..HHE.mseed
akshay    5037  4941  1 18:56 pts/0    00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PGC..HHN.mseed
akshay    5039  4943  1 18:56 pts/0    00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PGC..HHZ.mseed

In this case, CN.HOPB..HHE.mseed is stalling (look at the start time). If this is the only one left at the end, best to just kill it and try downloading it again. The good thing is that the logs will report the failed download.

akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ kill -9 4988
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ kill -9 5056
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ cd dt_root/2020/01/20
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util/dt_root/2020/01/20$ rm 20200120.CN.HOPB..HHE.mseed
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util/dt_root/2020/01/20$ rm 20200120.CN.PTRF..HHE.mseed
rm: cannot remove '20200120.CN.PTRF..HHE.mseed': No such file or directory
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util/dt_root/2020/01/20$ wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.HOPB..HHE.mseed
--2021-03-11 19:02:29--  ftp://ftp.seismo.nrcan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.HOPB..HHE.mseed
           => ‘20200120.CN.HOPB..HHE.mseed’
Resolving ftp.seismo.nrcan.gc.ca (ftp.seismo.nrcan.gc.ca)... 132.156.41.1, 162.219.55.2, 132.246.161.100, ...
Connecting to ftp.seismo.nrcan.gc.ca (ftp.seismo.nrcan.gc.ca)|132.156.41.1|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /wfdata5/CN/2020/01/20 ... done.
==> SIZE 20200120.CN.HOPB..HHE.mseed ... 9830400
==> PASV ... done.    ==> RETR 20200120.CN.HOPB..HHE.mseed ... done.
Length: 9830400 (9.4M) (unauthoritative)

20200120.CN.HOPB..HHE.mseed                          100%[===================================================================================================================>]   9.38M   140KB/s    in 57s

2021-03-11 19:03:27 (168 KB/s) - ‘20200120.CN.HOPB..HHE.mseed’ saved [9830400]

akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util/dt_root/2020/01/20$ wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PTRF..HHE.mseed
--2021-03-11 19:03:35--  ftp://ftp.seismo.nrcan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PTRF..HHE.mseed
           => ‘20200120.CN.PTRF..HHE.mseed’
Resolving ftp.seismo.nrcan.gc.ca (ftp.seismo.nrcan.gc.ca)... 132.156.41.1, 162.219.55.2, 132.246.161.100, ...
Connecting to ftp.seismo.nrcan.gc.ca (ftp.seismo.nrcan.gc.ca)|132.156.41.1|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /wfdata5/CN/2020/01/20 ... done.
==> SIZE 20200120.CN.PTRF..HHE.mseed ... 5660672
==> PASV ... done.    ==> RETR 20200120.CN.PTRF..HHE.mseed ... done.
Length: 5660672 (5.4M) (unauthoritative)

20200120.CN.PTRF..HHE.mseed                          100%[===================================================================================================================>]   5.40M   213KB/s    in 29s

2021-03-11 19:04:05 (188 KB/s) - ‘20200120.CN.PTRF..HHE.mseed’ saved [5660672]

Annoying how it starts to just work when attempted again. I wonder if this has to do with WSL2 somehow. :(

avzero07 commented 3 years ago

Got 1 month worth of Data out!!

akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/22

real    18m32.506s
user    0m36.783s
sys     1m17.554s
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/20

real    10m43.929s
user    0m31.583s
sys     1m9.629s
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/24

real    2m49.964s
user    0m17.103s
sys     0m37.873s
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/25

real    9m42.624s
user    0m29.289s
sys     0m58.937s
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/27

real    9m51.417s
user    0m32.348s
sys     1m9.883s
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/28

real    4m41.941s
user    0m27.773s
sys     1m2.841s
avzero07 commented 3 years ago

Most of Apr 2020 was downloaded without issue.

akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/04/
03/ 06/ 09/ 10/ 17/ 19/ 22/ 24/
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/04/03

real    2m4.077s
user    0m15.211s
sys     0m33.882s
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/04/06

real    2m4.943s
user    0m14.896s
sys     0m32.729s
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/04/09

real    2m2.222s
user    0m14.935s
sys     0m33.334s
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/04/10

real    2m5.683s
user    0m15.064s
sys     0m32.472s
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/04/17

real    2m24.683s
user    0m15.926s
sys     0m34.608s
avzero07 commented 3 years ago

All of the 2020 data has been downloaded and backed up.

avzero07 commented 3 years ago

Fixed batch processing. Logging is also separate. At the batch (month) level and day level.

avzero07 commented 3 years ago

All of the 2019 data has been downloaded and backed up.