INGV / instance

Creative Commons Attribution 4.0 International
40 stars 11 forks source link

checking waveforms against pick labels #3

Closed filefolder closed 8 months ago

filefolder commented 9 months ago

Hi there, I have a few questions/comments about the data in the events_counts collection. For the first few examples I have examined, it seems like there are multiple events in each 2 minute segment, but only 1 pair of P/S picks in the metadata? This will confuse the ML trainer, so I am wondering how typical this is and if these multiple event instances are labelled or otherwise able to be filtered out.

For example, looking at trace_name 10000541.IV.CAMP..HH

10000541 IV CAMP HH

The corresponding metadata is:

source_id,station_network_code,station_code,station_location_code,station_channels,station_latitude_deg,station_longitude_deg,station_elevation_m,station_vs_30_mps,station_vs_30_detail,source_origin_time,source_latitude_deg,source_longitude_deg,source_depth_km,source_origin_uncertainty_s,source_latitude_uncertainty_deg,source_longitude_uncertainty_deg,source_depth_uncertainty_km,source_stderror_s,source_gap_deg,source_horizontal_uncertainty_km,source_magnitude,source_magnitude_type,source_mt_eval_mode,source_mt_status,source_mt_scalar_moment,source_mechanism_strike_dip_rake,source_mechanism_moment_tensor,path_travel_time_P_s,path_travel_time_S_s,path_residual_P_s,path_residual_S_s,path_ep_distance_km,path_hyp_distance_km,path_azimuth_deg,path_backazimuth_deg,path_weight_phase_location_P,path_weight_phase_location_S,trace_start_time,trace_dt_s,trace_npts,trace_eval_P,trace_P_uncertainty_s,trace_P_arrival_time,trace_polarity,trace_eval_S,trace_S_uncertainty_s,trace_S_arrival_time,trace_P_arrival_sample,trace_S_arrival_sample,trace_E_median_counts,trace_N_median_counts,trace_Z_median_counts,trace_E_mean_counts,trace_N_mean_counts,trace_Z_mean_counts,trace_E_min_counts,trace_N_min_counts,trace_Z_min_counts,trace_E_max_counts,trace_N_max_counts,trace_Z_max_counts,trace_E_rms_counts,trace_N_rms_counts,trace_Z_rms_counts,trace_E_lower_quartile_counts,trace_N_lower_quartile_counts,trace_Z_lower_quartile_counts,trace_E_upper_quartile_counts,trace_N_upper_quartile_counts,trace_Z_upper_quartile_counts,trace_E_spikes,trace_N_spikes,trace_Z_spikes,trace_E_snr_db,trace_N_snr_db,trace_Z_snr_db,trace_E_pga_cmps2,trace_E_pgv_cmps,trace_E_pga_perc,trace_E_pga_time,trace_E_pgv_time,trace_E_sa03_cmps2,trace_E_sa10_cmps2,trace_E_sa30_cmps2,trace_N_pga_cmps2,trace_N_pgv_cmps,trace_N_pga_perc,trace_N_pga_time,trace_N_pgv_time,trace_N_sa03_cmps2,trace_N_sa10_cmps2,trace_N_sa30_cmps2,trace_Z_pga_cmps2,trace_Z_pgv_cmps,trace_Z_pga_perc,trace_Z_pga_time,trace_Z_pgv_time,trace_Z_sa03_cmps2,trace_Z_sa10_cmps2,trace_Z_sa30_cmps2,trace_pga_cmps2,trace_pgv_cmps,trace_pga_perc,trace_sa03_cmps2,trace_sa10_cmps2,trace_sa30_cmps2,trace_name,trace_GPD_P_number,trace_GPD_S_number,trace_EQT_number_detections,trace_EQT_P_number,trace_EQT_S_number,trace_deconvolved_units,source_type
10000541,IV,CAMP,,HH,42.53578,13.409,1283.0,517.0,Vs30 extracted from ShakeMap,2016-11-16T01:35:56.21Z,42.7592,13.1932,10.3,0.03,0.0018,0.0024,0.2,0.19,44.0,0.24,2.1,ML,,,,,,6.19,10.53,-0.02,-0.22,30.482,32.175,144.4,324.6,99.0,71.0,2016-11-16T01:35:45.14Z,0.01,12000,manual,0.1,2016-11-16T01:36:02.40Z,positive,manual,0.3,2016-11-16T01:36:06.74Z,1726,2160.0,3.0,-4.0,2.0,-0.02208,0.02483,-0.01783,-1880.0,-893.0,-1259.0,1850.0,1516.0,1294.0,104.27,97.017,85.736,-42.0,-51.0,-33.0,47.0,43.0,33.0,0.0,0.0,0.0,17.329,16.827,20.449,0.02146025,0.00045726,0.00218834,2016-11-16T01:36:07.190200Z,2016-11-16T01:36:07.230200Z,0.0017980065,7.41901e-05,2.27114e-05,0.0142861,0.00035812,0.00145678,2016-11-16T01:36:07.140100Z,2016-11-16T01:36:07.170100Z,0.0009974314,8.14522e-05,1.68142e-05,0.01503256,0.00029328,0.00153289,2016-11-16T01:36:07.180100Z,2016-11-16T01:36:07.580100Z,0.0007128434,5.55609e-05,1.60743e-05,0.02146025,0.00045726,0.00218834,0.0017980064,8.14522e-05,2.27114e-05,10000541.IV.CAMP..HH,1.0,4.0,1.0,1.0,1.0,mps,earthquake

all trace_EQT_P_number and trace_EQT_S_number and trace_EQT_number_detections parameters are 1.0

Another observation (which possibly explains the above) is that the P/S pick times in the CSV don't seem to correspond to the waveform data. In the metadata above it looks like there is only 1 P and 1 S pick at sample numbers 1726 & 2160.0, however from the image I don't think those correspond to any of the three events. The P and S times (2016-11-16T01:36:02.40Z & 2016-11-16T01:36:06.74Z) are consistent with those pick sample numbers given the trace start time of 2016-11-16T01:35:45.14Z, so I am wondering if the data plotted above is correct for this trace or could have been mixed up, or if I am possibly doing something wrong elsewhere?

The files I am using are instance_events_counts.hdf5 (April 28 2021) and metadata_Instance_events.csv (Aug 22 2021). Is it possible to double check this example to see if this is consistent for you as well?

Thanks for your help!

amichelini commented 9 months ago

Dear Robert, Thank you for the evidencing the problem. We will verify better what you have sent to us and let you know. Please keep in mind that the presence of more than one earthquake in the 120 s window cannot be excluded although we tried to avoid it. In our experience and after all the verifications we made this should be a very isolated case. More to come in the next days about the sample # that you mentioned. Kind regards Alberto


Alberto Michelini Istituto Nazionale di Geofisica e Vulcanologia (INGV) Via di Vigna Murata, 605 00143 ROMA, Italy Ph. +39 06 51860611, e-mail: @.*** Skype: amichelini https://orcid.org/0000-0001-6716-8551


On Wed, 20 Dec 2023 at 03:24, Robert Pickle @.***> wrote:

Hi there, I have a few questions/comments about the data in the events_counts collection. For the first few examples I have examined, it seems like there are multiple events in each 2 minute segment, but only 1 pair of P/S picks in the metadata? This will confuse the ML trainer, so I am wondering how typical this is and if these multiple event instances are labelled or otherwise able to be filtered out.

For example, looking at trace_name 10000541.IV.CAMP..HH

10000541.IV.CAMP.HH.png (view on web) https://github.com/INGV/instance/assets/22922540/ffdd8d77-a984-44f1-b18e-95e8cab655a1

The corresponding metadata is:

source_id,station_network_code,station_code,station_location_code,station_channels,station_latitude_deg,station_longitude_deg,station_elevation_m,station_vs_30_mps,station_vs_30_detail,source_origin_time,source_latitude_deg,source_longitude_deg,source_depth_km,source_origin_uncertainty_s,source_latitude_uncertainty_deg,source_longitude_uncertainty_deg,source_depth_uncertainty_km,source_stderror_s,source_gap_deg,source_horizontal_uncertainty_km,source_magnitude,source_magnitude_type,source_mt_eval_mode,source_mt_status,source_mt_scalar_moment,source_mechanism_strike_dip_rake,source_mechanism_moment_tensor,path_travel_time_P_s,path_travel_time_S_s,path_residual_P_s,path_residual_S_s,path_ep_distance_km,path_hyp_distance_km,path_azimuth_deg,path_backazimuth_deg,path_weight_phase_location_P,path_weight_phase_location_S,trace_start_time,trace_dt_s,trace_npts,trace_eval_P,trace_P_uncertainty_s,trace_P_arrival_time,trace_polarity,trace_eval_S,trace_S_uncertainty_s,trace_S_arrival_time,trace_P_arrival_sample,trace_S_arrival_sample,trace_E_median_counts,trace_N_median_counts,trace_Z_median_counts,trace_E_mean_counts,trace_N_mean_counts,trace_Z_mean_counts,trace_E_min_counts,trace_N_min_counts,trace_Z_min_counts,trace_E_max_counts,trace_N_max_counts,trace_Z_max_counts,trace_E_rms_counts,trace_N_rms_counts,trace_Z_rms_counts,trace_E_lower_quartile_counts,trace_N_lower_quartile_counts,trace_Z_lower_quartile_counts,trace_E_upper_quartile_counts,trace_N_upper_quartile_counts,trace_Z_upper_quartile_counts,trace_E_spikes,trace_N_spikes,trace_Z_spikes,trace_E_snr_db,trace_N_snr_db,trace_Z_snr_db,trace_E_pga_cmps2,trace_E_pgv_cmps,trace_E_pga_perc,trace_E_pga_time,trace_E_pgv_time,trace_E_sa03_cmps2,trace_E_sa10_cmps2,trace_E_sa30_cmps2,trace_N_pga_cmps2,trace_N_pgv_cmps,trace_N_pga_perc,trace_N_pga_time,trace_N_pgv_time,trace_N_sa03_cmps2,trace_N_sa10_cmps2,trace_N_sa30_cmps2,trace_Z_pga_cmps2,trace_Z_pgv_cmps,trace_Z_pga_perc,trace_Z_pga_time,trace_Z_pgv_time,trace_Z_sa03_cmps2,trace_Z_sa10_cmps2,trace_Z_sa30_cmps2,trace_pga_cmps2,trace_pgv_cmps,trace_pga_perc,trace_sa03_cmps2,trace_sa10_cmps2,trace_sa30_cmps2,trace_name,trace_GPD_P_number,trace_GPD_S_number,trace_EQT_number_detections,trace_EQT_P_number,trace_EQT_S_number,trace_deconvolved_units,source_type 10000541,IV,CAMP,,HH,42.53578,13.409,1283.0,517.0,Vs30 extracted from ShakeMap,2016-11-16T01:35:56.21Z,42.7592,13.1932,10.3,0.03,0.0018,0.0024,0.2,0.19,44.0,0.24,2.1,ML,,,,,,6.19,10.53,-0.02,-0.22,30.482,32.175,144.4,324.6,99.0,71.0,2016-11-16T01:35:45.14Z,0.01,12000,manual,0.1,2016-11-16T01:36:02.40Z,positive,manual,0.3,2016-11-16T01:36:06.74Z,1726,2160.0,3.0,-4.0,2.0,-0.02208,0.02483,-0.01783,-1880.0,-893.0,-1259.0,1850.0,1516.0,1294.0,104.27,97.017,85.736,-42.0,-51.0,-33.0,47.0,43.0,33.0,0.0,0.0,0.0,17.329,16.827,20.449,0.02146025,0.00045726,0.00218834,2016-11-16T01:36:07.190200Z,2016-11-16T01:36:07.230200Z,0.0017980065,7.41901e-05,2.27114e-05,0.0142861,0.00035812,0.00145678,2016-11-16T01:36:07.140100Z,2016-11-16T01:36:07.170100Z,0.0009974314,8.14522e-05,1.68142e-05,0.01503256,0.00029328,0.00153289,2016-11-16T01:36:07.180100Z,2016-11-16T01:36:07.580100Z,0.0007128434,5.55609e-05,1.60743e-05,0.02146025,0.00045726,0.00218834,0.0017980064,8.14522e-05,2.27114e-05,10000541.IV.CAMP..HH,1.0,4.0,1.0,1.0,1.0,mps,earthquake

all trace_EQT_P_number and trace_EQT_S_number and trace_EQT_number_detections parameters are 1.0

Another observation (which possibly explains the above) is that the P/S pick times in the CSV don't seem to correspond to the waveform data. In the metadata above it looks like there is only 1 P and 1 S pick at sample numbers 1726 & 2160.0, however from the image I don't think those correspond to any of the three events. The P and S times (2016-11-16T01:36:02.40Z & 2016-11-16T01:36:06.74Z) are consistent with those pick sample numbers given the trace start time of 2016-11-16T01:35:45.14Z, so I am wondering if the data plotted above is correct for this trace or could have been mixed up, or if I am possibly doing something wrong elsewhere?

The files I am using are instance_events_counts.hdf5 (April 28 2021) and metadata_Instance_events.csv (Aug 22 2021). Is it possible to double check this example to see if this is consistent for you as well?

Thanks for your help!

— Reply to this email directly, view it on GitHub https://github.com/INGV/instance/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATJILKTOGJQCBLJAICDQF3YKJD4ZAVCNFSM6AAAAABA4DMGMKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2DSNZSGI4TCMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

SpinaCianetti commented 8 months ago

Dear Robert, here is the plot of the trace you indicated that exactly matches the metadata values you listed above. Please check your plot.

Kind regards Spina

test_10000541 IV CAMP HH

filefolder commented 8 months ago

Thank you very much for your reply. You are correct, when I look now the data looks fine. No idea what I might have done.

Very sorry for the trouble and I look forward to training with this dataset!