Open avzero07 opened 3 years ago
File retriever works.
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$` time python3 file_retriever.py dummy.test.2.txt dt_root Current Local Dir : /mnt/c/Users/aksha/Desktop/eew-nn-project/util/dt_root/2020/11/28 Downloaded 20201128.CN.WSLR..HNN.mseed to . Downloaded 20201128.CN.HOPB..HHZ.mseed to . Downloaded 20201128.CN.WSLR..HHE.mseed to . Downloaded 20201128.CN.HOPB..HHE.mseed to . Downloaded 20201128.CN.HOPB..HNZ.mseed to . Downloaded 20201128.CN.HOPB..HNE.mseed to . Downloaded 20201128.CN.HOPB..HNN.mseed to . Downloaded 20201128.CN.HOPB..HHN.mseed to . Downloaded 20201128.CN.WSLR..HNZ.mseed to . Downloaded 20201128.CN.WSLR..HHN.mseed to . Downloaded 20201128.CN.WSLR..HNE.mseed to . Downloaded 20201128.CN.WSLR..HHZ.mseed to . Data from 2020-11-28 already present! Skipping real 0m31.046s user 0m2.374s sys 0m1.963s
Currently single threaded. Need to run parallely. Perhaps one thread per station or one per event (won't be efficient unless the date set can be shared)?
FTP connections seem to be throttled during the day. Need to test again at night.
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dummy.test.2.txt dt_root Current Local Dir : /mnt/c/Users/aksha/Desktop/eew-nn-project/util/dt_root/2020/11/28 Downloaded 20201128.CN.NCSB..HHE.mseed to . Downloaded 20201128.CN.NCSB..HHZ.mseed to . Downloaded 20201128.CN.NCSB..HHN.mseed to . Downloaded 20201128.CN.JEDB..HHE.mseed to . Downloaded 20201128.CN.JEDB..HNZ.mseed to . Downloaded 20201128.CN.JEDB..HNN.mseed to . Downloaded 20201128.CN.JEDB..HHZ.mseed to . Downloaded 20201128.CN.JEDB..HNE.mseed to . Downloaded 20201128.CN.SNB..HHE.mseed to . Downloaded 20201128.CN.HSNB..HHN.mseed to . Downloaded 20201128.CN.HSNB..HHZ.mseed to . Downloaded 20201128.CN.HSNB..HNN.mseed to . Downloaded 20201128.CN.JEDB..HHN.mseed to . .... .... Downloaded 20201128.CN.PACB..HHN.mseed to . Downloaded 20201128.CN.PACB..HNE.mseed to . Downloaded 20201128.CN.PACB..HNZ.mseed to . Downloaded 20201128.CN.PACB..HNN.mseed to . Downloaded 20201128.CN.PACB..HHZ.mseed to . Data from 2020-11-28 already present! Skipping real 21m16.047s user 0m14.662s sys 0m19.835s
Parallel execution with ProcessPoolExecutor works well!
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py . Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HHN.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HHZ.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HNE.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HNN.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HNZ.mseed real 0m11.859s user 0m0.774s sys 0m1.908s
Larger test with ProcessPoolExecutor. Attempt to download all data for target stations from 2020/11/28
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/11/28/ Downloaded /wfdata5/CN/2020/11/28/20201128.CN.BFSB..HHE.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.BFSB..HHN.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.BFSB..HHZ.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.BFSB..HNE.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.BFSB..HNN.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.BFSB..HNZ.mseed .... .... .... Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WPB..HHN.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WPB..HHZ.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WPB..HNE.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WPB..HNN.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WPB..HNZ.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HHE.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HHN.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HHZ.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HNE.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HNN.mseed Downloaded /wfdata5/CN/2020/11/28/20201128.CN.WSLR..HNZ.mseed real 1m31.664s user 0m15.555s sys 0m36.213s
115 files in 1m31.664s!!!
Takes about 16 minutes to create the folder tree and prep manifest files. This is one-time, so I guess it's fine. Log file looks good too.
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever_prep.py Search-Param-EQCanada.txt dt_root real 15m41.108s user 0m4.136s sys 0m1.920s
Batch scheduling was a bust. Kept running into deadlocks. Sticking to semi-auto for now.
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ ls dt_root/2020/01/ 04 06 07 08 18 19 20 22 24 25 27 28 akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/07 real 6m1.882s user 0m21.770s sys 0m46.059s akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/08 real 9m10.658s user 0m28.806s sys 1m3.600s akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/18 real 5m40.986s user 0m18.657s sys 0m42.861s akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/19 real 8m32.061s user 0m19.044s sys 0m42.889s
Sometimes, one transfer will stall, causing the rest to hold up. Best thing to do is manually intervene and kill the process associated to the stalled download.
Usually restarting the stalled file separately works.
UID PID PPID C STIME TTY TIME CMD root 1 0 0 10:11 ? 00:00:00 /init ... ... ... akshay 4959 43 0 18:51 pts/1 00:00:00 tail -f dt_root/2020/01/20/file_retriever.log akshay 4988 4944 0 18:52 pts/0 00:00:01 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.HOPB..HHE.mseed akshay 5025 4937 1 18:55 pts/0 00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.NTKA..HHN.mseed akshay 5026 4938 1 18:55 pts/0 00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.NTKA..HHZ.mseed akshay 5030 4934 1 18:55 pts/0 00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PACB..HHE.mseed akshay 5031 4940 1 18:55 pts/0 00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PACB..HHN.mseed akshay 5032 4942 1 18:55 pts/0 00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PACB..HHZ.mseed akshay 5033 4936 1 18:55 pts/0 00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PACB..HNE.mseed akshay 5034 4935 1 18:55 pts/0 00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PACB..HNN.mseed akshay 5035 4939 1 18:55 pts/0 00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PACB..HNZ.mseed akshay 5036 4933 0 18:55 pts/0 00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PGC..HHE.mseed akshay 5037 4941 1 18:56 pts/0 00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PGC..HHN.mseed akshay 5039 4943 1 18:56 pts/0 00:00:00 wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PGC..HHZ.mseed
In this case, CN.HOPB..HHE.mseed is stalling (look at the start time). If this is the only one left at the end, best to just kill it and try downloading it again. The good thing is that the logs will report the failed download.
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ kill -9 4988 akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ kill -9 5056 akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ cd dt_root/2020/01/20 akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util/dt_root/2020/01/20$ rm 20200120.CN.HOPB..HHE.mseed akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util/dt_root/2020/01/20$ rm 20200120.CN.PTRF..HHE.mseed rm: cannot remove '20200120.CN.PTRF..HHE.mseed': No such file or directory akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util/dt_root/2020/01/20$ wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.HOPB..HHE.mseed --2021-03-11 19:02:29-- ftp://ftp.seismo.nrcan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.HOPB..HHE.mseed => ‘20200120.CN.HOPB..HHE.mseed’ Resolving ftp.seismo.nrcan.gc.ca (ftp.seismo.nrcan.gc.ca)... 132.156.41.1, 162.219.55.2, 132.246.161.100, ... Connecting to ftp.seismo.nrcan.gc.ca (ftp.seismo.nrcan.gc.ca)|132.156.41.1|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /wfdata5/CN/2020/01/20 ... done. ==> SIZE 20200120.CN.HOPB..HHE.mseed ... 9830400 ==> PASV ... done. ==> RETR 20200120.CN.HOPB..HHE.mseed ... done. Length: 9830400 (9.4M) (unauthoritative) 20200120.CN.HOPB..HHE.mseed 100%[===================================================================================================================>] 9.38M 140KB/s in 57s 2021-03-11 19:03:27 (168 KB/s) - ‘20200120.CN.HOPB..HHE.mseed’ saved [9830400] akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util/dt_root/2020/01/20$ wget ftp://ftp.seismo.NRCan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PTRF..HHE.mseed --2021-03-11 19:03:35-- ftp://ftp.seismo.nrcan.gc.ca/wfdata5/CN/2020/01/20/20200120.CN.PTRF..HHE.mseed => ‘20200120.CN.PTRF..HHE.mseed’ Resolving ftp.seismo.nrcan.gc.ca (ftp.seismo.nrcan.gc.ca)... 132.156.41.1, 162.219.55.2, 132.246.161.100, ... Connecting to ftp.seismo.nrcan.gc.ca (ftp.seismo.nrcan.gc.ca)|132.156.41.1|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /wfdata5/CN/2020/01/20 ... done. ==> SIZE 20200120.CN.PTRF..HHE.mseed ... 5660672 ==> PASV ... done. ==> RETR 20200120.CN.PTRF..HHE.mseed ... done. Length: 5660672 (5.4M) (unauthoritative) 20200120.CN.PTRF..HHE.mseed 100%[===================================================================================================================>] 5.40M 213KB/s in 29s 2021-03-11 19:04:05 (188 KB/s) - ‘20200120.CN.PTRF..HHE.mseed’ saved [5660672]
Annoying how it starts to just work when attempted again. I wonder if this has to do with WSL2 somehow. :(
Got 1 month worth of Data out!!
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/22 real 18m32.506s user 0m36.783s sys 1m17.554s akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/20 real 10m43.929s user 0m31.583s sys 1m9.629s akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/24 real 2m49.964s user 0m17.103s sys 0m37.873s akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/25 real 9m42.624s user 0m29.289s sys 0m58.937s akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/27 real 9m51.417s user 0m32.348s sys 1m9.883s akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/01/28 real 4m41.941s user 0m27.773s sys 1m2.841s
Most of Apr 2020 was downloaded without issue.
akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/04/ 03/ 06/ 09/ 10/ 17/ 19/ 22/ 24/ akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/04/03 real 2m4.077s user 0m15.211s sys 0m33.882s akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/04/06 real 2m4.943s user 0m14.896s sys 0m32.729s akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/04/09 real 2m2.222s user 0m14.935s sys 0m33.334s akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/04/10 real 2m5.683s user 0m15.064s sys 0m32.472s akshay@JARVIS-XPS:/mnt/c/Users/aksha/Desktop/eew-nn-project/util$ time python3 file_retriever.py dt_root/2020/04/17 real 2m24.683s user 0m15.926s sys 0m34.608s
All of the 2020 data has been downloaded and backed up.
Fixed batch processing. Logging is also separate. At the batch (month) level and day level.
All of the 2019 data has been downloaded and backed up.
The email based retrieval is not possible. This was always inefficient anyway. The public facing FTP server ftp.seismo.NRCan.gc.ca contains waveform data in miniseed format, updated daily. The archive of data is available from 1975.