enova / pg_fact_loader

Build fact tables asynchronously with Postgres
MIT License
11 stars 6 forks source link

07_launch_worker regression diff on armhf #12

Open df7cb opened 3 years ago

df7cb commented 3 years ago

Hi,

on Debian's armhf architecture (32-bit, hardware floating point) this regression diff keeps showing up. The other architectures seem fine.

*** regression.diffs ****
diff -U3 /tmp/autopkgtest-lxc.wiownlp4/downtmp/build.U2W/src/expected/07_launch_worker.out /tmp/autopkgtest-lxc.wiownlp4/downtmp/build.U2W/src/results/07_launch_worker.out
--- /tmp/autopkgtest-lxc.wiownlp4/downtmp/build.U2W/src/expected/07_launch_worker.out   2020-09-03 20:59:13.000000000 +0000
+++ /tmp/autopkgtest-lxc.wiownlp4/downtmp/build.U2W/src/results/07_launch_worker.out    2020-10-20 14:41:09.498708198 +0000
@@ -35,7 +35,7 @@
            7 | 0001234567 |  35 |               |                   0 | 
            8 | 0001234568 |  35 |               |                   0 | 
            9 | 0001234569 |  35 |               |                   0 | 
-          10 | 0001234577 |  35 |               |                   0 | 
+          10 | 0001234560 |  35 |               |                   0 | 
 (10 rows)

 UPDATE fact_loader.fact_tables SET force_worker_priority = FALSE WHERE fact_table_relid = 'test_fact.customers_fact'::REGCLASS;
### End 13 installcheck (FAILED with exit code 1) ###

https://ci.debian.net/data/autopkgtest/testing/armhf/p/pg-fact-loader/7623900/log.gz https://ci.debian.net/packages/p/pg-fact-loader/testing/armhf/

Is that some bug that is worth fixing? (Since that's not a mainstream architecture, we could also opt to ignore the error.)

df7cb commented 3 years ago

I'll ignore the error on armhf for now, but it would be nice to know if that diff exhibits a bug.

jfinzel commented 2 years ago

@df7cb I apologize for never getting back to you on this. I believe it just a race condition for the background worker taking too long to start, and if I increased the pg_sleep to 5 seconds from 3, the issue would be resolved. I could do that if you think it's worth it and better than ignoring the error?

df7cb commented 2 years ago

Ignoring tests is always bad since it means we don't care about quality. So yes, if you think that fixes the problem, please change it, and tag a new release so I can update the packages. Thanks!

jfinzel commented 2 years ago

@df7cb PR ready at https://github.com/enova/pg_fact_loader/pull/16 - should I just tag it or is there a way you can test on armhf before I release? Thanks.

df7cb commented 2 years ago

Hi @jfinzel, thanks for the update!

The failure on 07 is gone, but there's a new one on 16/17:

**** build-14/regression.diffs ****
diff -U3 /home/myon/pg_fact_loader/expected/16_1_2_features.out /home/myon/pg_fact_loader/build-14/results/16_1_2_features.out
--- /home/myon/pg_fact_loader/expected/16_1_2_features.out      2022-09-08 11:15:55.175171953 +0200
+++ /home/myon/pg_fact_loader/build-14/results/16_1_2_features.out      2022-09-08 11:30:24.956673138 +0200
@@ -226,7 +226,8 @@
 --This is NOT a new feature but a new test coverage - testing concurrency.
 \! psql contrib_regression -c 'BEGIN; SELECT fact_loader.worker() INTO try1; SELECT pg_sleep(2); COMMIT;' &
 SELECT pg_sleep(1);
-NOTICE:  table "process_queue" does not exist, skipping
+NOTICE:  00000: table "process_queue" does not exist, skipping
+LOCATION:  DropErrorMsgNonExistent, tablecmds.c:1272
  pg_sleep 
 ----------

@@ -234,8 +235,10 @@

 \! psql contrib_regression -c ' SELECT fact_loader.worker() INTO try2;'
 SELECT 1
+Time: 14,398 ms
 SELECT pg_sleep(2);
 COMMIT
+Time: 2209,096 ms (00:02,209)
  pg_sleep 
 ----------

diff -U3 /home/myon/pg_fact_loader/expected/17_1_3_features.out /home/myon/pg_fact_loader/build-14/results/17_1_3_features.out
--- /home/myon/pg_fact_loader/expected/17_1_3_features.out      2022-09-08 11:15:55.175171953 +0200
+++ /home/myon/pg_fact_loader/build-14/results/17_1_3_features.out      2022-09-08 11:30:33.856811277 +0200
@@ -15,7 +15,7 @@
 FROM fact_loader.safely_terminate_workers();
  number_terminated | number_still_live | pids_still_live_ct 
 -------------------+-------------------+--------------------
-                 0 |                 2 |                  2
+                   |                   |                   
 (1 row)

 --Should work because the processes should have been idle now for at least 5 seconds
@@ -31,7 +31,7 @@
 FROM fact_loader.safely_terminate_workers();
  number_terminated | number_still_live | pids_still_live_ct 
 -------------------+-------------------+--------------------
-                 2 |                 0 |                   
+                   |                   |                   
 (1 row)

 INSERT INTO test.orders (order_id, customer_id, order_date, total)

The 16 diffs ("00000: ... LOCATION", "Time:") are harmless and just means it picked up my ~/.psqlrc. (Ideally all psql calls in the tests should be using psql -X.)

But the 17 diff reappear even when running the test on that armhf box a second time.

jfinzel commented 2 years ago

@df7cb I pushed another commit to address those two issues. Could you give it another try? Thanks.

df7cb commented 2 years ago

Hi Jeremy, sorry for taking so long to get back to this.

I've just uploaded the package to apt.postgresql.org and Debian, test results from there should be available over the next day or so.

https://ci.debian.net/packages/p/pg-fact-loader/unstable/armel/ https://ci.debian.net/packages/p/pg-fact-loader/testing/armel/ (not sure which of the two links is faster)

On a side note, the package builds fine against PG15, but cannot be tested yet since pglogical isn't available for PG15 yet.

df7cb commented 2 years ago

Bad news, 1.7.0 is now even failing on amd64: https://ci.debian.net/data/autopkgtest/unstable/amd64/p/pg-fact-loader/26861681/log.gz https://ci.debian.net/packages/p/pg-fact-loader/

jfinzel commented 1 year ago

@df7cb sorry again for long delay. I have a different plan which may be to deprecate this feature (the background worker), thus remove this test. We do not use it anymore ourselves but instead run the worker externally. I need to make other changes anyway so I will advise soon on this.... thanks for your patience.