GeoscienceAustralia / hiperseis

High Performance Seismologic Data and Metadata Processing System
GNU General Public License v3.0
56 stars 22 forks source link

convert and ingest ENGDAHL events into seiscomp3 #23

Closed basaks closed 6 years ago

basaks commented 6 years ago

These are the events that @alexgorb passed onto us in text files.

niketchhajed commented 6 years ago

Currently, looking at the files received from @alexgorb and analysing them for migration to Siescomp3. The goal is to interpret the data files and map their schema to Seiscomp3 database schema as much as possible. If there are any missing fields, erroneous data or conflicting fields, the same needs to be reported and mitigated.

niketchhajed commented 6 years ago

We have 3 types of files: DAT files, HDF files and OUT files. As per discussion with Alexei, the DAT files, which are in the FFB format (Fixed Format Bulletin) need to be parsed and the pick, origin, amplitude, magnitude and event information needs to be extracted. The extracted data needs to be massaged into the FDSNXML format that can be imported into the SC3 DB. The public ID information needs to be synthetically created and added to the resulting FDSNXML file before importing to the SC3 DB.

The FFB format is described at: http://www.isc.ac.uk/standards/ffb/

After going through the Project Report pdf file, the HDF and OUT files seem to be downstream files as a result of processing by ENGDAHL.

niketchhajed commented 6 years ago

(Please read this comment in edit mode) Pasting some notes while discussing with Alexei:

origin:

  1. lat lon depth Agency OriginTime Magnitude TP mthodID(EHB) earthModelID(ak135) 9
    <usedPhaseCount>7</usedPhaseCount>
    <associatedStationCount>9</associatedStationCount>
    <usedStationCount>7</usedStationCount>
    <standardError>3.030084268</standardError>
    <azimuthalGap>149.6036987</azimuthalGap>
    <maximumDistance>15.91859436</maximumDistance>
    <minimumDistance>4.030264378</minimumDistance>
    <medianDistance>7.434346199</medianDistance>

evaluationMode(automatic)

NA
    <author>scautoloc@ip-172-31-30-172</author>
    <creationTime>2017-10-19T07:14:16.002041Z</creationTime>
  </creationInfo>

(if no residual, it was not used for location but only for association)

pick:

  1. phase Net Sta Channel Res Dis(deg) Az(or backazimuth) Time methodID(default to STA/LTA) evaluationMode(automatic) author(EHB)
basaks commented 6 years ago

@niketchhajed As discussed, here is a approach that might work. We should try and create obspy event objects for each event and obspy has event export functionality into quakeml and also sc3ml.

See how I create picks and amplitude obspy objects in seismic.pickers.PickerMixin class. We need to be able to further create the rest of the objects that are required by the event class, e.g., here. Then we can dump a SC3ML/quakeml that can be ingested into seiscomp3.

We can pursue a similar appraoch for Earthmon/OracleDB event transformation and ingestion.

For PhasePApy, @sudhirjain may have to pursue a similar approach.

niketchhajed commented 6 years ago

In further discussion with @alexgorb, the relocated data files .HDF and .OUT is what needs to be considered for importing into SC3 db.

Given the fact that in .DAT files, there are 5 sec differences in arrival times for the same station between those received by GA stations and those received by ISC from other sources, importing from .DAT would probably not be the best idea. The .HDF and .OUT files have data that has relocated origins and relocated arrival times. These will be imported in SC3.

niketchhajed commented 6 years ago

Still there are some grey areas in the .HDF and .OUT files as described below:

  1. In .OUT files, for the same P arrival, there are multiple phase labels i.e. P, Pn, eP, iP, PP, PnPn, PcP, Pdiff, PKiKP, ePKP, ePKI, PKP, pP, epP, etc. These notations need to be clarified.
  2. In the .HDF file, there are 2 columns: ntot(total number of observations used) and ntel(number of teleseismic observations used - delta > 28 deg). The arrival data in .OUT files needs to be correlated with the values in these 2 columns, to establish which arrivals were used for association and which were used for location. Need more clarification on this.
alexgorb commented 6 years ago
  1. There are many different phases (http://www.isc.ac.uk/standards/phases/) and all are required for our purposes. However the first characters such as 'e' or 'i' corresponds to the energy of the arrival phase - emergent or impulsive. These characters usually omitted during conversions if can not be transferred into separated field of new format.
  2. It is not very clear to me. I would say that these numbers are only for model selection purposes and do not have any relation to association or location.
niketchhajed commented 6 years ago

I have listed below some clarifications required from @alexgorb:

  1. There are more than a million unique stations involved in arrivals in the ENGDAHL files. However, the network code information is missing. Is there an easy way to uniquely determine the network code for a given station name?
  2. In the .out file, most of the S arrivals have the station name missing. Like below:

Do we assume that the station name (and other details) for all S arrivals is the same as the P arrival immediately before?

  1. How to process for arrivals that have the first (and second) phase missing. For e.g. the SCP or PCS arrival above.

  2. One of the desired fields as part of the arrival information is distance. Is the delta field the same as the required distance field? It would be better if we can get the meaning of all columns listed below:

delta, dtdd, focal angle, (the missing column names for the 2 phases), scor, wgt( whether it is time weight or backazimuth weight?)

  1. What do the *s after the residual values mean?
niketchhajed commented 6 years ago

arrivals

alexgorb commented 6 years ago

Later phases that come without station name (blank fields) correspond to the first station located above these lines. In this particular case to HHC

From: Niket Chhajed [mailto:notifications@github.com] Sent: Wednesday, 1 November 2017 12:00 PM To: GeoscienceAustralia/passive-seismic Cc: Gorbatov Alexei; Mention Subject: [DKIM] Re: [GeoscienceAustralia/passive-seismic] convert and ingest ENGDAHL events into seiscomp3 (#23)

[arrivals]https://user-images.githubusercontent.com/8789808/32255448-bf20507a-bef3-11e7-9c95-a5c03b81f8c3.jpg

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GeoscienceAustralia/passive-seismic/issues/23#issuecomment-340949029, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFjpn-Z95AdIgye4osa8V7WMDjWusMbOks5sx8KDgaJpZM4P-bhl.

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.

alexgorb commented 6 years ago

Please note - some stations may have only "later" phases such as PKP because no P wave present at long distances.

alexgorb commented 6 years ago

I checked the list of registered seismic stations. There are less than 1600 stations. @niketchhajed how did you calculate more than a million? See attached file. fdsnsta2013.zip

niketchhajed commented 6 years ago

allstations.txt

@alexgorb This is the list of unique stations that are involved in the arrival data of ENGDAHL. If you find that something is not right in this list of stations, let me know. I will investigate.

alexgorb commented 6 years ago

So far I found that there are many duplicates and names such as one character (that can not be the name of station). We need to benchmark against the list I sent you.

From: Niket Chhajed [mailto:notifications@github.com] Sent: Thursday, 2 November 2017 11:55 AM To: GeoscienceAustralia/passive-seismic Cc: Gorbatov Alexei; Mention Subject: Re: [GeoscienceAustralia/passive-seismic] convert and ingest ENGDAHL events into seiscomp3 (#23)

allstations.txthttps://github.com/GeoscienceAustralia/passive-seismic/files/1436123/allstations.txt

@alexgorbhttps://github.com/alexgorb This is the list of unique stations that are involved in the arrival data of ENGDAHL. If you find that something is not write in this list of stations, let me know. I will investigate.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GeoscienceAustralia/passive-seismic/issues/23#issuecomment-341286831, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFjpn5WgZDAM3TGeuNkpGwCLA1w2BBJcks5syRL_gaJpZM4P-bhl.

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is intended only for the person or entity to which it is addressed. If you are not the intended recipient, then you have received this e-mail by mistake and any use, dissemination, forwarding, printing or copying of this e-mail and its file attachments is prohibited. The security of emails transmitted cannot be guaranteed; by forwarding or replying to this email, you acknowledge and accept these risks.

niketchhajed commented 6 years ago

Some more feedback from @alexgorb:

  1. magnitude integration
  2. distance (and other fields) for s phase the same as the immediately preceding p phase
  3. include the entire phase and not the first character
niketchhajed commented 6 years ago

The current state of data migration was reviewed with @alexgorb and it seems to be in an acceptable condition. Below are the points to be worked upon but not urgent:

  1. Integrate the network codes with the data.

Closing this issue and creating a separate issue for network codes integration.

Zephyrpony commented 6 years ago

Jira Task PST-215 https://gajira.atlassian.net/browse/PST-215

niketchhajed commented 6 years ago

The engdahl events are backed up at s3://pyrobots-backup/niket/engdahl-events/

basaks commented 6 years ago

Command to copy s3 dir using awscli: aws s3 cp s3://pyrobots-backup/niket/engdahl-events/ target_dir/ --recursive.

basaks commented 6 years ago

All engdahl and isc events from the sc3 bucket are also copied in NCI here with read access for everyone: /g/data/ha3/sudipta/event_xmls.

niketchhajed commented 6 years ago

@basaks, just fyi. these isc events do not have preferred origin set. The ones with preferred origin set are currently in an AWS instance. I will replace the latest in S3.