Closed job closed 5 years ago
I can give ssh access to the server that runs this software if RIPE NCC can't replicate
You are not alone. Well, I got all except RIPE up after some time, but RIPE remains down on one box. IIRC I once had similar issues on other boxes and got that fixed by wiping out the DB and rsync directory, but this time it doesn't seem to help (and for now I'm not touching the others).
Mar 19 12:01:05 rpki-rtr-server.sh: 2019-03-19 12:01:05.881 WARN 20200 --- [eduler_Worker-2] n.r.r.r.a.v.RefreshCacheController : validator http://localhost:8080/api/objects/validated not ready yet, will retry later
Mar 19 12:01:25 rpki-validator-3.sh: 2019-03-19 12:01:25,425 #011 WARN #011 quartzScheduler_Worker-9 #011#011#011 o.h.e.j.s.SqlExceptionHelper #011#011 SQL Error: 23505, SQLState: 23505
Mar 19 12:01:25 rpki-validator-3.sh: 2019-03-19 12:01:25,425 #011 ERROR #011 quartzScheduler_Worker-9 #011#011#011 o.h.e.j.s.SqlExceptionHelper #011#011 Unique index or primary key violation: "RPKI_OBJECT__SHA256_IDX ON PUBLIC.RPKI_OBJECT(SHA256) VALUES (X'020218fa6a03578506d1338e39474bc4a8c8ddf5530853d01730182d1953787f', 260064)"; SQL statement:
Mar 19 12:01:25 rpki-validator-3.sh: /* insert net.ripe.rpki.validator3.domain.RpkiObject */ insert into rpki_object (created_at, updated_at, version, authority_key_identifier, last_marked_reachable_at, serial_number, sha256, signing_time, type, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [23505-197]
Mar 19 12:01:40 rpki-validator-3.sh: 2019-03-19 12:01:40,438 #011 WARN #011 quartzScheduler_Worker-4 #011#011#011 o.h.e.j.s.SqlExceptionHelper #011#011 SQL Error: 23505, SQLState: 23505
Mar 19 12:01:40 rpki-validator-3.sh: 2019-03-19 12:01:40,438 #011 ERROR #011 quartzScheduler_Worker-4 #011#011#011 o.h.e.j.s.SqlExceptionHelper #011#011 Unique index or primary key violation: "RPKI_OBJECT__SHA256_IDX ON PUBLIC.RPKI_OBJECT(SHA256) VALUES (X'b48be5a31f6c8ac276d7f70f31f9a22d0a44580b40583a606983255ced6525c6', 215964)"; SQL statement:
Mar 19 12:01:40 rpki-validator-3.sh: /* insert net.ripe.rpki.validator3.domain.RpkiObject */ insert into rpki_object (created_at, updated_at, version, authority_key_identifier, last_marked_reachable_at, serial_number, sha256, signing_time, type, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [23505-197]
I vague recall someone once mentioned that the SQL primary key violation errors could be safely ignored … can't they?
Nusenu also reported on routing-wg, that >370 slowed down rsync operations (esp. with RIPE). Not sure if that anyhow links to this.
It's been a performance regression, fixed in build 397. The unique key violation problem depends on the state of a repository and in principle can be ignored, but we are working on eliminating it completely.
I loaded rpki-validator-3.0-397
- similar issue. After 30 minutes only APNIC & RIPE are loaded. Seems to have stalled somewhere in processing the other 3 repositories.
I can confirm this, after DB cleanup (https://github.com/RIPE-NCC/rpki-validator-3/issues/61#issuecomment-475523608) I observe similar issue with 3.0-397, after clean start only RIPE and APNIC TALs are loaded after more than 30 minutes, other repositories are stuck in pending state.
In logs, I also see following errors (repeatedly):
Mar 22 09:05:11 rpki rpki-validator-3.sh: 2019-03-22 09:05:11,152 #011 WARN #011 quartzScheduler_Worker-8 #011#011#011 o.h.e.j.s.SqlExceptionHelper #011#011 SQL Error: 23505, SQLState: 23505 Mar 22 09:05:11 rpki rpki-validator-3.sh: 2019-03-22 09:05:11,152 #011 ERROR #011 quartzScheduler_Worker-8 #011#011#011 o.h.e.j.s.SqlExceptionHelper #011#011 Unique index or primary key violation: "RPKI_OBJECT__SHA256_IDX ON PUBLIC.RPKI_OBJECT(SHA256) VALUES (X'e2c6bd586f389e206a8ad6748943982ff8018106da4e4443d1d2a62c949e9ba5', 185199)"; SQL statement: Mar 22 09:05:11 rpki rpki-validator-3.sh: /* insert net.ripe.rpki.validator3.domain.RpkiObject */ insert into rpki_object (created_at, updated_at, version, authority_key_identifier, last_marked_reachable_at, serial_number, sha256, signing_time, type, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [23505-197] Mar 22 09:05:11 rpki rpki-validator-3.sh: 2019-03-22 09:05:11,152 #011 ERROR #011 quartzScheduler_Worker-8 #011#011#011 o.h.i.ExceptionMapperStandardImpl #011#011 HHH000346: Error during managed flush [org.hibernate.exception.ConstraintViolationException: could not execute statement] Mar 22 09:05:16 rpki rpki-rtr-server.sh: 2019-03-22 09:05:16.461 WARN 3410 --- [eduler_Worker-1] n.r.r.r.a.v.RefreshCacheController : validator http://localhost:8080/api/objects/validated not ready yet, will retry later
Reported primary key violations were there also before upgrade from build 386 to 397.
We are trying to be a bit more careful on releasing, but there is a release candidate version 406 if you wanted to have help us for early testing: https://ftp.ripe.net/tools/rpki/validator3/rc/centos7/repo/
Main changes:
On Sat, Mar 30, 2019 at 3:06 PM Job Snijders notifications@github.com wrote:
Almost two weeks have passed since reporting this issue. It appears that at this point in time there no longer is any version of the RPKI v3 validator that runs stable.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RIPE-NCC/rpki-validator-3/issues/77#issuecomment-478248967, or mute the thread https://github.com/notifications/unsubscribe-auth/AAijg1MlrhvReoUGOr5U7PfHV05n_sMxks5vb29cgaJpZM4b76Kk .
3.1 (_2019.07.0406.41.46) startup with clean (empty) database was pretty fast. Database backend change from H2 to Xodus probably solves this issue.
Using
rpki-validator-3.0-394-dist.tar.gz
(md5sum fafb95e829a5d5f5905b748bbd95e2dc).Steps to reproduce:
Within a few minutes it'll have rsynced most of the data:
The web interface now shows one TAL operational (APNIC): https://rpki.meerval.net/trust-anchors
The log spits out lots of this:
2 hours later the RPKI validator still hasn't booted beyond the APNIC tal