Icinga / icinga-doc

Icinga 1.x documentation in Docbook (EOL)
https://www.icinga.org/download/
4 stars 9 forks source link

[dev.icinga.com #9818] status.dat gets lost when filesystem is too small for two copies of that file #465

Open icinga-migration opened 9 years ago

icinga-migration commented 9 years ago

This issue has been migrated from Redmine: https://dev.icinga.com/issues/9818

Created by tolimar on 2015-08-03 11:29:59 +00:00

Assignee: Wolfgang Status: Assigned Target Version: (none) Last Update: 2015-10-30 08:45:13 +00:00 (in Redmine)

Icinga Version: 1.13.3

Hi!

For performance reasons we hold status.dat on a ramdisk of 100MB. This file system had 35MB space available, so we didn't thought of any problems, till we noticed that the Icinga GUI started complain about a missing status.dat file. However, with our status.dat file growing beyond 35MB the file system became to small to contain a second copy for an updated copy of that file.

In the process of not being able to write a temporary status.dat file in that directory, the original status.dat file was also removed.

While it certainly makes sense, to create status.dat as a temporary file, and copy it to the corresponding directory in a save way (I assume it is first copied over with a temporary name and then move to the actual filename) I think the old status.dat should not have been removed in this situation.

Icinga already shows already when it is working on an outdated status.dat file, and that is what I would have prefered in this situation.

For a temporary solution, please update the documentation at http://docs.icinga.org/latest/en/temp\_data.html to reflect that need.

For reference: The error message in icinga.log where the following: [1438594555] Error: my_fcopy() failed to write to '/var/spool/icinga/ramdisk/status.dat': No space left on device [1438594555] Error: Unable to rename file '/dev/shm/icinga.tmpCQ6MOG' to '/var/spool/icinga/ramdisk/status.dat': No space left on device [1438594555] Error: Unable to update status data file '/var/spool/icinga/ramdisk/status.dat': No space left on device

We use the following configuration entries, if that matters: /etc/icinga/icinga.cfg: status_file=/var/spool/icinga/ramdisk/status.dat temp_file=/dev/shm/icinga.tmp temp_path=/dev/shm

/etc/fstab: tmpfs /var/spool/icinga/ramdisk tmpfs size=100M 0 0 tmpfs /var/spool/icinga/checkresults tmpfs size=250M 0 0

  1. ls -l /var/spool/icinga/ramdisk total 66428 -rw-r-r- 1 icinga icinga 29168960 Aug 3 11:58 objects.cache -rw-r-r- 1 icinga icinga 38704068 Aug 3 13:27 status.dat

Attachments

icinga-migration commented 9 years ago

Updated by tolimar on 2015-08-07 13:00:07 +00:00

Should anyone else stumble over this problem: We are now the attached check to get notified shoudl the ramdisk gets to small.

icinga-migration commented 8 years ago

Updated by mfriedrich on 2015-10-26 09:15:49 +00:00

Imho this is nothing the core could detect itself. Maybe a documentation update helps ... @Wolfgang what do you think?

icinga-migration commented 8 years ago

Updated by Wolfgang on 2015-10-28 20:15:12 +00:00

I'll try to fix that soon.

icinga-migration commented 8 years ago

Updated by tolimar on 2015-10-30 08:41:53 +00:00

dnsmichi wrote:

Imho this is nothing the core could detect itself. Maybe a documentation update helps ... @Wolfgang what do you think?

Thinking about it, I see two different issues:

  1. The missing documentation, that the ramdisk should be large enough.
  2. That the old status.dat file got lost, when the attempt to replace it failed. IMHO it would have been okay to leave the old status.dat in place and let the gui show a warning, that it is outdated. But somehow it got removed completely.
icinga-migration commented 8 years ago

Updated by tolimar on 2015-10-30 08:45:13 +00:00

tolimar wrote: [..]

Thinking about it, I see two different issues: # The missing documentation, that the ramdisk should be large enough.

While checking the doc to propose a change, I noticed, that it is already documented, and I missed it.

Chapter 8.8. "Temporary Data" already states at the very beginning "Add the size of the status file for temporary data"... Sorry for missing that.

Still leaves the removal of the old status.dat file...