Closed yazun closed 2 years ago
It actually also happens for regular queries.
select sum(length) from ts;
ERROR: node:datanode12, backend_pid:24061, nodename:datanode5,backend_pid:34335,message:global xid is corrupted given len 48 gxid len9
Time: 52.768 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sum(length) from ts;
ERROR: node:datanode8, backend_pid:13942, nodename:datanode2,backend_pid:37623,message:global xid is corrupted given len 48 gxid len9
Time: 36.913 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sum(length) from ts;
ERROR: node:datanode8, backend_pid:13942, nodename:datanode3,backend_pid:27589,message:global xid is corrupted given len 48 gxid len9
Time: 37.058 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sum(length) from ts;
ERROR: node:datanode8, backend_pid:13942, nodename:datanode3,backend_pid:27589,message:global xid is corrupted given len 48 gxid len9
Time: 36.571 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sum(length) from ts;
ERROR: node:datanode8, backend_pid:13942, nodename:datanode3,backend_pid:27589,message:global xid is corrupted given len 48 gxid len9
Time: 35.325 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sum(length) from ts;
ERROR: node:datanode8, backend_pid:13942, nodename:datanode3,backend_pid:27589,message:global xid is corrupted given len 48 gxid len9
Time: 37.105 ms
Then connecting to a different coord helps. It looks like ver 2.2.0 regression...
Unfortunately this problem occurs more and more often for quite innocent queries. Any idea? i.e
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode6, backend_pid:9908, nodename:datanode4,backend_pid:7535,message:global xid is corrupted given len 53 gxid len9
Time: 32.294 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode2,backend_pid:9042,message:global xid is corrupted given len 53 gxid len9
Time: 27.912 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode2,backend_pid:9042,message:global xid is corrupted given len 53 gxid len9
Time: 26.062 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode2,backend_pid:9042,message:global xid is corrupted given len 53 gxid len9
Time: 26.927 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode2,backend_pid:9042,message:global xid is corrupted given len 53 gxid len9
Time: 25.664 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode3, backend_pid:18974, nodename:datanode2,backend_pid:9042,message:global xid is corrupted given len 53 gxid len9
Time: 24.320 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode2,backend_pid:9042,message:global xid is corrupted given len 53 gxid len9
Time: 27.258 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode2,backend_pid:9042,message:global xid is corrupted given len 53 gxid len9
Time: 27.624 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode2,backend_pid:9042,message:global xid is corrupted given len 53 gxid len9
Time: 27.683 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode2,backend_pid:9042,message:global xid is corrupted given len 53 gxid len9
Time: 26.231 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode3, backend_pid:18974, nodename:datanode2,backend_pid:9042,message:global xid is corrupted given len 53 gxid len9
Time: 24.936 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode2,backend_pid:9042,message:global xid is corrupted given len 53 gxid len9
Time: 26.833 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode3,backend_pid:18974,message:global xid is corrupted given len 53 gxid len9
Time: 25.697 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode3,backend_pid:18974,message:global xid is corrupted given len 53 gxid len9
Time: 25.583 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode3,backend_pid:18974,message:global xid is corrupted given len 53 gxid len9
Time: 26.417 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode5, backend_pid:34913, nodename:datanode3,backend_pid:18974,message:global xid is corrupted given len 53 gxid len9
Time: 24.796 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode3,backend_pid:18974,message:global xid is corrupted given len 53 gxid len10
Time: 25.981 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > select sourceid,classification_types,sostypes from dr3_ops_cs36_mv.final_dr3_export_helper where inClassification and inSosAgn and not('{AGN}' <@ classification_types);
ERROR: node:datanode4, backend_pid:7535, nodename:datanode2,backend_pid:9042,message:global xid is corrupted given len 53 gxid len10
changing the coordinator helps.
it seems a problem about parallel workers, try
set max_parallel_workers_per_gather to 0;
We have it set to 10, it will impact performance quite a lot. Is it understood why it happens?
and can confirm switching off parallel workers stops crashing (at the cost of speed of course).
and can confirm switching off parallel workers stops crashing (at the cost of speed of course).
I'll stick on this issue and please try different value of the GUC to see what will happen
Oh, set enable_distri_debug_print to on, and report us the log on DN around the error message
it should be fixed ~
Yes, confirm we do not reproduce it. Thank a lot.
With the v 2.2.0 we started to see worrying errors, the query works always ok when issued without copy:
Any idea what could be happening? What other info we could provide? Thanks