Closed basaks closed 6 years ago
Currently, looking at the files received from @alexgorb and analysing them for migration to Siescomp3. The goal is to interpret the data files and map their schema to Seiscomp3 database schema as much as possible. If there are any missing fields, erroneous data or conflicting fields, the same needs to be reported and mitigated.
We have 3 types of files: DAT files, HDF files and OUT files. As per discussion with Alexei, the DAT files, which are in the FFB format (Fixed Format Bulletin) need to be parsed and the pick, origin, amplitude, magnitude and event information needs to be extracted. The extracted data needs to be massaged into the FDSNXML format that can be imported into the SC3 DB. The public ID information needs to be synthetically created and added to the resulting FDSNXML file before importing to the SC3 DB.
The FFB format is described at: http://www.isc.ac.uk/standards/ffb/
After going through the Project Report pdf file, the HDF and OUT files seem to be downstream files as a result of processing by ENGDAHL.
(Please read this comment in edit mode) Pasting some notes while discussing with Alexei:
origin:
<usedPhaseCount>7</usedPhaseCount>
<associatedStationCount>9</associatedStationCount>
<usedStationCount>7</usedStationCount>
<standardError>3.030084268</standardError>
<azimuthalGap>149.6036987</azimuthalGap>
<maximumDistance>15.91859436</maximumDistance>
<minimumDistance>4.030264378</minimumDistance>
<medianDistance>7.434346199</medianDistance>
evaluationMode(automatic)
<author>scautoloc@ip-172-31-30-172</author>
<creationTime>2017-10-19T07:14:16.002041Z</creationTime>
</creationInfo>
(if no residual, it was not used for location but only for association)
pick:
@niketchhajed As discussed, here is a approach that might work.
We should try and create obspy
event objects for each event and obspy
has event export functionality into quakeml
and also sc3ml
.
See how I create picks
and amplitude
obspy objects in seismic.pickers.PickerMixin class. We need to be able to further create the rest of the objects that are required by the event class, e.g., here. Then we can dump a SC3ML
/quakeml
that can be ingested into seiscomp3
.
We can pursue a similar appraoch for Earthmon/OracleDB event transformation and ingestion.
For PhasePApy
, @sudhirjain may have to pursue a similar approach.
In further discussion with @alexgorb, the relocated data files .HDF and .OUT is what needs to be considered for importing into SC3 db.
Given the fact that in .DAT files, there are 5 sec differences in arrival times for the same station between those received by GA stations and those received by ISC from other sources, importing from .DAT would probably not be the best idea. The .HDF and .OUT files have data that has relocated origins and relocated arrival times. These will be imported in SC3.
Still there are some grey areas in the .HDF and .OUT files as described below:
I have listed below some clarifications required from @alexgorb:
Do we assume that the station name (and other details) for all S arrivals is the same as the P arrival immediately before?
How to process for arrivals that have the first (and second) phase missing. For e.g. the SCP or PCS arrival above.
One of the desired fields as part of the arrival information is distance. Is the delta field the same as the required distance field? It would be better if we can get the meaning of all columns listed below:
delta, dtdd, focal angle, (the missing column names for the 2 phases), scor, wgt( whether it is time weight or backazimuth weight?)
Later phases that come without station name (blank fields) correspond to the first station located above these lines. In this particular case to HHC
From: Niket Chhajed [mailto:notifications@github.com] Sent: Wednesday, 1 November 2017 12:00 PM To: GeoscienceAustralia/passive-seismic Cc: Gorbatov Alexei; Mention Subject: [DKIM] Re: [GeoscienceAustralia/passive-seismic] convert and ingest ENGDAHL events into seiscomp3 (#23)
[arrivals]https://user-images.githubusercontent.com/8789808/32255448-bf20507a-bef3-11e7-9c95-a5c03b81f8c3.jpg
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GeoscienceAustralia/passive-seismic/issues/23#issuecomment-340949029, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFjpn-Z95AdIgye4osa8V7WMDjWusMbOks5sx8KDgaJpZM4P-bhl.
Please note - some stations may have only "later" phases such as PKP because no P wave present at long distances.
I checked the list of registered seismic stations. There are less than 1600 stations. @niketchhajed how did you calculate more than a million? See attached file. fdsnsta2013.zip
@alexgorb This is the list of unique stations that are involved in the arrival data of ENGDAHL. If you find that something is not right in this list of stations, let me know. I will investigate.
So far I found that there are many duplicates and names such as one character (that can not be the name of station). We need to benchmark against the list I sent you.
From: Niket Chhajed [mailto:notifications@github.com] Sent: Thursday, 2 November 2017 11:55 AM To: GeoscienceAustralia/passive-seismic Cc: Gorbatov Alexei; Mention Subject: Re: [GeoscienceAustralia/passive-seismic] convert and ingest ENGDAHL events into seiscomp3 (#23)
allstations.txthttps://github.com/GeoscienceAustralia/passive-seismic/files/1436123/allstations.txt
@alexgorbhttps://github.com/alexgorb This is the list of unique stations that are involved in the arrival data of ENGDAHL. If you find that something is not write in this list of stations, let me know. I will investigate.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GeoscienceAustralia/passive-seismic/issues/23#issuecomment-341286831, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFjpn5WgZDAM3TGeuNkpGwCLA1w2BBJcks5syRL_gaJpZM4P-bhl.
Some more feedback from @alexgorb:
The current state of data migration was reviewed with @alexgorb and it seems to be in an acceptable condition. Below are the points to be worked upon but not urgent:
Closing this issue and creating a separate issue for network codes integration.
Jira Task PST-215 https://gajira.atlassian.net/browse/PST-215
The engdahl events are backed up at s3://pyrobots-backup/niket/engdahl-events/
Command to copy s3 dir using awscli: aws s3 cp s3://pyrobots-backup/niket/engdahl-events/ target_dir/ --recursive
.
All engdahl and isc events from the sc3 bucket are also copied in NCI here with read access for everyone: /g/data/ha3/sudipta/event_xmls
.
@basaks, just fyi. these isc events do not have preferred origin set. The ones with preferred origin set are currently in an AWS instance. I will replace the latest in S3.
These are the events that @alexgorb passed onto us in text files.