GeoNet / help

An issues repo for technical help questions.
6 stars 3 forks source link

Duplicate picks in events from FDSN- (fixed from Post May2023 onward) #105

Open calum-chamberlain opened 1 year ago

calum-chamberlain commented 1 year ago

Hi all, I wondered if there was a good reason for duplicate picks appearing in events downloaded from the FDSN service? For instance event 2022p788472 has duplicate picks for multiple stations including duplicate P-picks for BFZ. There are two P-picks sharing the same pick id (smi:nz.org.geonet/20221019.173832.28-AIC-NZ.BFZ.10.HHZ) that, to my eyes, have all the same information.

If there isn't a good reason, would it be possible to fix this?

Thanks, Calum.

salichon commented 1 year ago

Hello @calum-chamberlain that s indeed a feature that i noticed a while ago - This is very occasional I reckon and a direct output from seiscomp I had a few cases and i had no chance to get some clarification around the processing ~eg. been able to reproduce that feature and address it to Gempa. (Glitch over the streams from the station - or from the picking in seiscomp) I ll resurrect this topic in the forum.

With regards to fixing this the answer is yes and easy, ..although the "process" to perform this in an operational manner is much less so, for now.

To me this is an example of event level catalogue data curation (FYI @elidana @JonoHanson ) that should be added into the to do list. We can expect an implementation of an environment that allows this to be performed immediatly in the incoming year.

NB:Please keep on postingevents there to keep on building the case.

thanks a lot Jerome

calum-chamberlain commented 1 year ago

This isn't rare at all in my experience. Consider the example below, which assumes that duplicate pick resource ids represent duplicated picks without checking other information explicitly:

from obspy.clients.fdsn import Client
from obspy import UTCDateTime
from collections import Counter

client = Client("GEONET")
cat = client.get_events(starttime=UTCDateTime(2022, 1, 1), endtime=UTCDateTime(2022, 2, 1))

duplicate_pick_count = 0
events_with_duplicates = 0
for ev in cat:
    pid_counts = Counter(p.resource_id for p in ev.picks)
    duplicate_picks = [pid for pid, count in pid_counts.items() if count > 1]
    duplicate_pick_count += len(duplicate_picks)
    if len(duplicate_picks):
        events_with_duplicates += 1

print(f"From {len(cat)} events, {events_with_duplicates} had duplicate picks.")
print(f"Total duplicate picks: {duplicate_pick_count}")

For this month there were 2,101 events returned, of which 1,877 had duplicate picks - there were 15,395 duplicate pick ids. Note that this is not all picks, it isn't obvious from my end which picks are duplicated. I haven't checked extensively whether it is only auto-picks that are duplicated, but so far I haven't seen any manual picks duplicated.

salichon commented 1 year ago

@calum-chamberlain thanks a lot for making the case ... I went Through this observation and then amend my comments above: 1- This is not the same odd random feature i thought about then where 2 same station picks are set in the seiscompml file solutions within a really short delay (without exact same times)

2- Indeed 2,101 events returned, of which 1,877 had duplicate picks looks definitely more than i was expecting .. see above!!

so... 3 - With regards to https://github.com/GeoNet/help/issues/105#issue-1620600818 Event file verification gives:

"Arrival" for BFZ is unique in both xml files
NB for clarity ( andother users) Picks =/= Arrivals. Arrivals are used in the event localizations. Picks may be or not associated to that solution

First conclusion/assumption: - seiscompml to fdsnxml event files conversion is looking buggy

Actions:

dup2022p788472.xml.txt dup2022p788472_fdsn.xml.txt

salichon commented 1 year ago

This may relate to thhis : https://github.com/GeoNet/fdsn/tree/main/cmd/fdsn-quake-consumer

elidana commented 1 year ago

@calum-chamberlain thanks a lot from my end as well for raising this. And good catch @salichon about the issue being only on the FDSN service side of things! Definitively agree that this is reassuring. @calum-chamberlain , have you noticed this issue only recently or is it something you have encountered since a while? We have upgraded the event service about a month ago (main change from the user perspective was the addition of the event_type), so trying to pinpoint the timeframe might help with troubleshooting.

calum-chamberlain commented 1 year ago

@elidana I am pretty sure this is a recent thing, probably in the one-month timeframe, but I'm no certain. I haven't had to cope with it until this year at least.

elidana commented 1 year ago

thanks @calum-chamberlain , that's very useful to know!

junghao commented 1 year ago

@salichon @elidana The QuakeMLs are generated by tool xsltproc (http://xmlsoft.org/xslt/xsltproc.html), not much about our own code.

You can try install xsltproc in your computer and run the command:

xsltproc sc3ml_0.11__quakeml_1.2.xsl 2022p788472.xml 

where the sc3ml_0.11__quakeml_1.2.xsl is provided by SeisComP3 here https://github.com/SeisComP3/seiscomp3/blob/master/src/trunk/libs/xml/0.11/sc3ml_0.11__quakeml_1.2.xsl

Have tested the command above and got the same result (having "duplicated pick"). Not sure if it's due to a bug (of xsltproc or the xsl file) , or it's the source data (sc3ml) makes it.

I guess we can file a bug to SeisCompP3 but seems need some elaboration about the event so probably not what I can do?

salichon commented 1 year ago

super @junghao thanks for that note1 : This repository has been archived by the owner on Oct 14, 2022. It is now read-only. we now shall point to https://github.com/SeisComP/common/tree/master/libs/xml/0.11 as point of reference and currently maintained repository. and more generally to https://github.com/SeisComP/common/tree/master/libs/xml (0.12 is on the horizon .. :)

junghao commented 1 year ago

Thanks @salichon .

Based on this PR https://github.com/SeisComP/common/commit/132fc95c68352548de1cd6871ad08109da3d0ad3#diff-e60a95631541fd9f0ff3a585e41fc7d493aec710d6f75c3e0edc3d7891d5ff71R262 , not sure if the comment (we exclude picks already referenced in amplitudes) is our case? (If yes then seems it didn't fix)

salichon commented 1 year ago

@junghao I confirm the conversion prioducing the same output thanks the only diffrence so far (with xsltproc) is the "agency name" from the repo it is "org.gfz-potsdam.de/geofon/"
so we would have to use option ""-stringparam ID_PREFIX smi:nz.org.geonet" instead i suppose.

the event 2022p pick id output for 2022p788472
gives 40 picks ID (composed of duplicated ones such BFZ) as opposed to the original 29 picks

Duplication as shown in that screenshot image

This shows a "sort of" random duplication

PickID files qml1.txt for quakeml and scml1.txt for original are attached

qml1.txt scml1.txt early comments: What the damned!

salichon commented 1 year ago

Can we confirm that s a xsl template bug ?

junghao commented 1 year ago

Also used xalan with same xsl template to do the transform, still got the duplicated picks. Pretty sure it's an xml template bug.

salichon commented 1 year ago

@junghao Now

See attached files test0.12-RT.txt sc3ml_0.11__quakeml_1.2-RT.xsl.txt

so questions:

Ref source: https://github.com/SeisComP/common/tree/master/libs/xml

Now provided some additional context given the questions above

salichon commented 1 year ago

Hi @junghao - No feed back from Gempa/users through the community channel yet so: assumption from the docs (https://www.seiscomp.de/doc/apps/sccnv.html) :

I ll confirm with Stephan et al. to inform this and get a solid answer cheers j

salichon commented 1 year ago

@calum-chamberlain @junghao Stephan is working on a solution

Picks are included in the resulting QuakeML file if they are either referenced by an Arrival or by an Amplitude. 
The XSLT already handles the case were the same Pick is referenced by an Amplitude and an Arrival, 
however it falls short in case the same Pick is referenced by different Amplitudes.
The RT version of the XSLT does not produce duplicated picks because in QuakeML-RT,
similar to SeisComP, Picks are top level elements independent of Events.
In QuakeML (non-RT) the Picks must be moved below the Event element 
and since the SeisComPML may contain multiple Events, references to the Picks via Event/OriginReference
and corresponding Origin/Arrival and Origin/StationMagnitude/Amplitude must be evaluated. 

I’ll try to improve the converter in this regards.

The QuakeML-RT converter is no appropriate solution for your use case since the FDSNWS event standard dictates QuakeML (non-RT).
salichon commented 1 year ago

Now this will rely on acceptance and deployment of the xml Style sheet onto seiscomp sources and Geonet services

salichon commented 1 year ago

@calum-chamberlain @junghao the duplication issue is resolved with an update of the CSS xml templates: https://github.com/SeisComP/common/tree/master/libs/xml

This style sheet xml fix requires to be propagated to GeoNet services to solve for the FDSN geonet event service.

junghao commented 1 year ago

@salichon They have sc3ml_0.12 files, do we want to add them to our FDSN as well?

salichon commented 1 year ago

At the moment we re about to go 4 and expectedly pretty quickly to the above versions i reckon @junghao

salichon commented 1 year ago

Kia Ora @junghao thanks this most likely can get a closure soon as deployed (May need more testing ?) thanks a lot and sorry for the delay

Verdict: Dev looks good

salichon commented 1 year ago

@calum-chamberlain Kia Ora The updated template style was deployed - we will monitor this along in the next days

Please keep us informed if this is going okay for you - and/or any feed back

Upon happiness level reached we ll close that ticket

elidana commented 10 months ago

enough time has passed, and looks like the issue is now fixed. So I think that happiness level is now reached

closing this, @salichon please reopen if that's not the case!

salichon commented 10 months ago

@elidana this is not resolved entirely If the problem is fixed the Quakexml datbase is required to be recomputed to "fix" event XML content prior to May 2023. afaik ALL xsl templates were corrected with that patch. :)