cybertec-postgresql / pg_squeeze

A PostgreSQL extension for automatic bloat cleanup
Other
470 stars 31 forks source link

Database crash #71

Closed ramkly closed 1 month ago

ramkly commented 6 months ago

I installed pg_squeeze from the master branch, it caused the database to go into recovery mode when I ran squeeze.squeeze_table. I tried reinstalling it with REL1_6 by following these steps:

Dropped the extension from my database.
Deleted pg_squeeze.so from the PostgreSQL lib folder.
Installed pg_squeeze-REL1_6.
Reinstalled the pg_squeeze extension in my database.

However, the issue persists, and squeeze.squeeze_table still causes the database to go into recovery mode. Did I miss something, or should I consider installing an older version?

ahouska commented 6 months ago

To "send database into recovery mode" probably means "to crash it". How is this issue different from https://github.com/cybertec-postgresql/pg_squeeze/issues/70 ?

Any progress in getting the core dump and/or the backtrace<

ahouska commented 6 months ago

I'm still curious if I can reproduce the problem. Can you please share the definition (CREATE TABLE) of the table which fails to get processed? And also the output of SELECT version(). (Besides the actual server version, it should report the OS and architecture.)

ramkly commented 6 months ago

OS details:

NAME="openSUSE Leap"
VERSION="15.3"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.3"
PRETTY_NAME="openSUSE Leap 15.3"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.3"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/" 

db version:

PostgreSQL 14.11 on x86_64-pc-linux-gnu, compiled by gcc (SUSE Linux) 7.5.0, 64-bit

Table structure:

 CREATE TABLE "vehicle"."last_status" (
  "vehicle_id" int8 NOT NULL,
  "status_id" int4,
  "status_updated" timestamptz(6),
  "longitude" float8,
  "latitude" float8,
  "location_updated" timestamptz(6),
  "speed" int2,
  "ignition" int2,
  "terminal_event_id" int8,
  "pnd_online" bool,
  "event_ts" timestamptz(6),
  "position_description_id" int8,
  "odometer" int8,
  "tracker_vehicle_id" int8,
  "idle_start" timestamptz(6),
  "last_valid_gps_ts" timestamptz(6),
  "rpm" int4,
  "third_party_sequence" int8 DEFAULT 0,
  "altitude" int4,
  "gps_fix_type" int2,
  "clock" int4,
  "last_identification_tag_id" varchar,
  "last_tag_ts" timestamptz(6),
  "bearing" int2,
  "last_ign_on_ts" timestamptz(6),
  "last_ign_off_ts" timestamptz(6),
  "terminal_event_type_id" int8,
  "event_description" varchar(20),
  "gps_accuracy" int2,
  "exceptions" int4,
  "input_state" int4,
  "temp1" float8,
  "temp2" float8,
  "temp3" float8,
  "temp4" float8,
  "event_geom" geometry,
  "driver_duty_state" varchar(2),
  "event_ts_last_visible" timestamptz(6),
  "driver_name" varchar,
  "input_state2" int4,
  "dynamic1" int8,
  "dynamic2" int8,
  "dynamic3" int8,
  "dynamic4" int8,
  "road_speed" int2,
  "input_state2_diagnostic_mask" int4,
  "input_state2_diagnostic_mask_counter" int4 DEFAULT 0,
  "oil_temp" int4,
  "water_temp" int4,
  "privacy_identification_tag" bool DEFAULT false,
  "fuel_level" int4,
  "vext" numeric(6,3),
  "vgsm" numeric(6,3),
  "user_id" int8,
  "enter_tile_ts" timestamptz(6),
  "tile_id" int8,
  "fuel_ts" timestamptz(6),
  "fuel_precentage_left" int4,
  "last_valid_latitude" float8,
  "last_valid_longitude" float8,
  "last_valid_position" int8,
  "battery_level" int4,
  "fuel_use" int4,
  "booking_status_id" int8,
  "booking_id" int8,
  "ev_battery_percentage_left" int4,
  "ev_battery_ts" timestamptz(6),
  "ev_charging_status" varchar,
  "ev_charging_status_ts" timestamptz(6),
  "is_charging" bool,
  "charging_start_ts" timestamptz(6),
  "brake_position" int4,
  "recieved_ts" timestamptz(6),
  "original_vehicle_id" int8,
  "input_state3" int4,
  "enter_geofence_ts" timestamptz(6),
  "geofence_id" varchar,
  "input_state2_event_ts" timestamptz(6),
  "client_driver_id" char(1) ,
  "malfuncation_state" int4,
  "terminal_id" int8,
  "brake_pressure" int4,
  "brake_bits" int4,
  "battery_charge" int8,
  CONSTRAINT "terminal_stataus_pkey" PRIMARY KEY ("vehicle_id")
);

After I downgraded to REL1_5, pg_squeeze started working. The only issue I faced with REL1_5 was related to the function squeeze.get_heap_freespace(), which doesn't return a valid amount of table-free space. However, with REL1-6, I encountered the same issue on some other servers. So, I have a plan to downgrade pg_squeeze on all servers.

ahouska commented 6 months ago

One more idea though: can you please share the REL1_6 library (pg_squeeze.so)? I'd like to try test it myself on opensuse 15.3.

And also, please send me the contents of shared_preload_libraries configuration parameter. And if there are other extensions there than pg_squeeze, please tell me their versions.

ramkly commented 5 months ago

Please find attached zip file. shared_preload is as follow shared_preload_libraries = 'pg_squeeze,pg_cron,pg_stat_statements,plprofiler'

pg_squeeze.zip

kovmir commented 1 month ago

@ramkly, you have opened the second segmentation fault issue with vague details and no stack trace, which makes fixing the issues practically impossible. If you need this settled, you have to provide a stack trace or a way to reproduce the crash.

You used master build, then switched to release without providing the exact steps taken. This procedure alone might have caused the crash. Like I said, without the required details there is no way to tell.

ramkly commented 1 month ago

Hi again, finally I was able to collect core dump. please find attached screenshot of core dump. please let me know if you need more information. image

ahouska commented 1 month ago

Thanks for the backtrace. Nevertheless, it looks weird because release_task() should not really call palloc().

As I probably said earlier, something might be wrong about your binary. Can you please replace it wit the one attached here? That's what I built for PG 14 on OpenSUSE 15.3. (I had to gzip that because github does not accept files with the .so suffix.) pg_squeeze.so.gz

ramkly commented 1 month ago

Could you please share the source file, I tried to compile REL1_6 but size of pg_squeeze.so you shared is 359K, the one I compiled is 95K. Could you confirm this is the right file I download? wget https://github.com/cybertec-postgresql/pg_squeeze/archive/refs/heads/REL1_6.zip And also could you please share the steps to change the library? I did folowing steps: 1- delete pg_squeeze.so from postgresql lib directory 2- copy the library you shared to postgresql lib directory 3- restart database 4- run alter extension pg_squeeze update Please let me know if I missed anything

kovmir commented 1 month ago

please find attached screenshot of core dump

@ramkly, thank you! Next time type set logging enabled on in GDB prompt to collect the output into a text file gdb.txt.

Could you confirm this is the right file I download? wget https://github.com/cybertec-postgresql/pg_squeeze/archive/refs/heads/REL1_6.zip

No it is not. You have to use a tag, not the latest commit of an arbitrary Git branch.

I did folowing steps: 1- delete pg_squeeze.so from postgresql lib directory 2- copy the library you shared to postgresql lib directory 3- restart database 4- run alter extension pg_squeeze update Please let me know if I missed anything

Right, but did it work?

Could you please share the source file, I tried to compile REL1_6 but size of pg_squeeze.so you shared is 359K, the one I compiled is 95K.

How do you check the size?

ahouska commented 1 month ago

@kovmir the 1.6 head and the last tag should be the same, but I agree that the tag is what should be downloaded.

@ramkly I think my file is bigger because it contains debugging symbols. Does it still crash? If so, what does the backtrace look like?

kovmir commented 1 month ago

I think my file is bigger because it contains debugging symbols.

Yes. On my system -g compilation flag gives 340K library, which becomes 94K if I strip it.

ramkly commented 1 month ago

So Far it worked without crashing but it didn’t help to remove table bloatsOn 6 Sep 2024, at 7:24 PM, ahouska @.***> wrote: @kovmir the 1.6 head and the last tag should be the same, but I agree that the tag is what should be downloaded. @ramkly I think my file is bigger because it contains debugging symbols. Does it still crash? If so, what does the backtrace look like?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

ahouska commented 1 month ago

So Far it worked without crashing but it didn’t help to remove table bloatsOn

Can you find the related record in the squeeze.log table?

ramkly commented 1 month ago

I found logs in the squeeze.error table showing that the squeeze process is sometimes unsuccessful due to an 'initial slot snapshot too large' error, while at other times, it succeeds. For instance, today I attempted to squeeze the table manually; at 9:13, it failed, but when I tried again 3 minutes later, it was successful. image and image

ahouska commented 1 month ago
  1. If there are no crashes anymore, something was probably wrong with your build, but I have no clue what it was. Please consider getting the extension binary from the openSUSE Build Service, according to https://www.postgresql.org/download/linux/suse/ (The latest OS version I see there is 15.5, so you might need to upgrade).

  2. The "initial slot snapshot too large" error is produced by PostgreSQL core, not by the extension. If the reason is too many running transactions, perhaps try to squeeze the table at a different time, when the system is not that busy.

  3. Regarding the bloat not being removed: how do you check that? Can't the problem be that the table gets bloated immediately after being squeezed? The logs you provided above show that, before 26k rows gets copied, 32k UPDATEs are performed.

kovmir commented 1 month ago

@ramkly, any news?

ramkly commented 1 month ago

Hi guys.After I compiled that version you shared, I didn’t face any issue, everything works well. RegardsAbbasOn 20 Sep 2024, at 2:02 PM, Ivan Kovmir @.***> wrote: @ramkly, any news?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

kovmir commented 1 month ago

@ramkly, glad to hear it!