EIDA / eidaws

EIDA NG webservice implementations
GNU General Public License v3.0
2 stars 3 forks source link

Routing harvesting duplicates channelepochs (and related epochs) in the routing DB #33

Open kaestli opened 1 year ago

kaestli commented 1 year ago

pre-existing channel-epochs are not recognized - each harvesting of the routing declaration introduces all channel epochs again as new ones, rather than updating the last-seen dates of the pre-existing ones. (This behaviour is not observed with network epochs and station epochs; however those still have the issue that the lastseen date is not updated; cfr #31

kaestli commented 10 months ago

affected are the entities channelepoch, epoch (if referred to by a channel epoch), and routing (if referring to such an epoch, however, this is always the case...)

evidence: Select network.code as net, station.code as sta, channelepoch.id, channelepoch.locationcode as loc, channelepoch.code as chan, epoch.starttime as start, epoch.endtime as end, epoch.lastseen as ep_lastseen, routing.lastseen as rout_lastseen, service.name from network inner join channelepoch on network.id = channelepoch.network_ref inner join station on station.id = channelepoch.station_ref inner join epoch on channelepoch.epoch_ref = epoch.id left join routing on routing.epoch_ref = epoch.id left join endpoint on routing.endpoint_ref = endpoint.id left join service on endpoint.service_ref = service.id where station.code = 'HAMIK' and channelepoch.code = 'HGZ' order by epoch.starttime, epoch.lastseen, channelepoch.id; -- 5000 -- note: only channelepochs have routing, -- but all - channelepoch, epoch and routing are multiplied

Select station.code as sta, epoch.starttime as start, epoch.endtime as end, epoch.lastseen as ep_lastseen, routing.lastseen as rout_lastseen, service.name from station inner join stationepoch on station.id = stationepoch.station_ref inner join epoch on stationepoch.epoch_ref = epoch.id left join routing on routing.epoch_ref = epoch.id left join endpoint on routing.endpoint_ref = endpoint.id left join service on endpoint.service_ref = service.id where station.code = 'HAMIK' order by epoch.starttime, epoch.lastseen, stationepoch.id; -- 1 epoch, no routing

Select network.code as sta, epoch.starttime as start, epoch.endtime as end, epoch.lastseen as ep_lastseen, routing.lastseen as rout_lastseen, service.name from network inner join networkepoch on network.id = networkepoch.network_ref inner join epoch on networkepoch.epoch_ref = epoch.id left join routing on routing.epoch_ref = epoch.id left join endpoint on routing.endpoint_ref = endpoint.id left join service on endpoint.service_ref = service.id where network.code = 'CH' order by epoch.starttime, epoch.lastseen, networkepoch.id -- 1 epoch, no routing.

kaestli commented 10 months ago

(note that this issue would widely be mediated by the cleanup mechanism for routes which disappeared from EIDA (old lastseen dates), however that process is not automated.

kaestli commented 10 months ago

While channelepochs and routes are duplicated rather than the lastseen date updated, for network epochs and station epochs the lastseen date is not updated (which is probably fine as long as it is not used).

The issue does not prevent the federator from "functioning", however it bloats the database and makes queries slow in the long-run.

kaestli commented 8 months ago

Until the code is fixed, the database can be kept under control as follows:

-- do this only once:

CREATE INDEX IF NOT EXISTS epoch_epochtype_ref_idx
    ON public.epoch USING btree
    (epochtype_ref ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE UNIQUE INDEX IF NOT EXISTS epochtype_id_idx
    ON public.epochtype USING btree
    (id ASC NULLS LAST)
    TABLESPACE pg_default;

-- do the following regularly:

delete from channelepoch where epoch_ref in
(select id from epoch where lastseen < (select ((now() - '3 days'::interval)::date)::timestamp));                   

delete from routing where epoch_ref in
(select id from epoch where lastseen < (select ((now() - '3 days'::interval)::date)::timestamp));

delete from epoch where lastseen < (select ((now() - '3 days'::interval)::date)::timestamp)
and id not in (select epoch_ref from stationepoch)
and id not in (select epoch_ref from networkepoch);

delete from epochtype where id not in (select epochtype_ref from epoch);