OPCFoundation / UA-LDS

Local Discovery Server
44 stars 48 forks source link

ualds.ini Becomes empty o invalid #33

Closed sina-rastegar closed 6 years ago

sina-rastegar commented 6 years ago

I've seen this many times, however I cannot provide a routine for reproducing it. I think this happens when opcualds.exe crashes while accessing its config file. What happens is that after a system crash or internal crash in opcualds service, the config file (ualds.ini) becomes corrupt (with just few empty headers). Original config file size is about 5KB while this stub's size is 1KB. After this happens, if you try to start the service you'll get an error about incorrect function blah blah. I usually keep a copy of ualds.ini and replace it and then start the OPA Local Discovery Service. However this is dirtiest approach to this problem. Regards

erhardgrishaber commented 6 years ago

Please provide the UA-LDS version you are using.

sina-rastegar commented 6 years ago

I'm using UALDS 1.03.367 A sample currupted file is provided: ualds_corrupt.txt

erhardgrishaber commented 6 years ago

A similar issue (#22) was already reported for the same LDS version. An improvement was made regarding this. There is a LDS v1.03.370 (ReleaseCandidate) on the OPC Foundation website. Please try it with that version. If it is still reproducible, let me know.

sina-rastegar commented 6 years ago

I searched for similar issue but didn't notice this one. Sorry for the duplicate :( I'll try 370 and see what happens (I didn't initially get to it since it was RC). Thanks

sina-rastegar commented 6 years ago

I did update the UALDS to 1.03.370 and let the system work for 1 week. After a few system crashes (power outage, etc.) this happened again :( worst part is that I tried to replicate the situation but I couldn't.

sina-rastegar commented 6 years ago

Hi, I could finally find out how to reproduce it: While you have an active connection with UALDS (e.g. checking registered servers), if you hard reset your OS (e.g. by pulling the power plug! or using the reset button), the file corrupts. What I do for now is to set a batch to recopy ualds.ini and set it to run on OPC UALDS service failure in Services applet in Windows. However, this clearly is not the answer. I also do not wish to change UALDS source code since I like to stick with official release.

erhardgrishaber commented 6 years ago

Thank you for the investigation. It will be fixed in the next release.

JochenKalmbach commented 6 years ago

@erhardgrishaber: Just a small question: How will this be fixed? By separating the storage of the registrations from the configuration???

erhardgrishaber commented 6 years ago

It is not decided yet, by your suggestion is probably the best solution.

sina-rastegar commented 6 years ago

@JochenKalmbach : good suggestion! What I did for now is that I wrote a C# console application with Admin privileges and set it to run upon UALDS service crash (failure) / it basically does this:

  1. Stops OPC UA & Bonjour services (if running)
  2. copies and original ualds.ini from it's resources to the ...\UA\Discovery\ folder.
  3. Tries to restart the services with a time delay.
sina-rastegar commented 6 years ago

Sorry to bother, but could you give an approximate date for the next release? right now, I've upgraded to 1.03.370.37 (latest release) and this issue still exists. As it seems my fix (mentioned in previous post) doesn't always work correctly specially in system failures.

Regards

erhardgrishaber commented 6 years ago

I don't know about the official release date, but I am starting to work on it today.

erhardgrishaber commented 6 years ago

@entitification Can you help with a Beta Test of the fix ?

sina-rastegar commented 6 years ago

@erhardgrishaber : Sure :) thanks for the fix. meanwhile, could you point me to a LDS version before this issue appeared? I need to deploy it on our current systems until the fix arrives (I remember version 1.01 was OK, but I can't find the latest 1.01.x).

erhardgrishaber commented 6 years ago

@entitification I recommend 1.02.335.1. It is available on the opcfoundation website.

sina-rastegar commented 6 years ago

@erhardgrishaber : I tried 1.02, still the same problem. so I went to 1.01.329 from my archive and it works like a charm :) It has some good properties:

  1. Its configurations are based on XML file inside ProgramData\Opc Foundation\Config. this is a handy address and plus configurations are much easier to understand and change in XML format. PLUS, I've seen this config file (probably as a stub) for 1.03 and 1.02 versions. It would be great to go back to such configurations.

  2. Config file and Endpoint cache files are separated in that version as suggested by @JochenKalmbach . So whatever you do (reset computer, unplug power, crash os, etc.) you cannot destroy the service. This means: the system operators won't ever call you and say "OPC is disconnected".

  3. Its certificate paths are based on legacy OPC (e.g. ProgramData\Certificate Stores\UA Applications) so there is no need to have multiple certificate copies to make legacy OPC services work. Moreover, IMO this addressing structures is more clean. ProgramData -> Certificate Stores -> Machine Default -> UA Applications -> Rejected Certificates

sina-rastegar commented 6 years ago

Hi, I got "OPC UA Local Discovery Server 1.03.371.424" and the problem still exists. Is this the corrected version? There's a difference though, when ualds.ini becomes corrupt, you cannot restore it and OPC UA Local Discovery Server goes into a state which doesn't accept messages (that is, you cannot stop/start service) and thus my OPC Recovery Helper service doesn't work.

erhardgrishaber commented 6 years ago

Hi @entitification. Yes, you are using the correct version. The description you provided (originaly) was that you hard reset your OS (while the LDS is working) and than the config file becomes corrupt (or empty) and you can not start the service. The fix includes the following steps: at startup the LDS is checking the config file if it is correct (and it can start with it); if not, it will try to use information from a backup config file; if this also failes it will generate default values (similar to a new install) for the config file. This was tested by manually corrupting (or emptying) the config file and start the service. Is there another usecase ?

JochenKalmbach commented 6 years ago

@erhardgrishaber, @entitification The easiest solution is: Set the read-only attribute of the ualds.ini ;) I do not see any reason why the LDS should write the registered servers to the cfg-ini file... each server will, by default, register themself every 30 seconds... so there is NO NEED to ever write to the ualds.ini. After looking into the source, it is enough to set the read-only flag, so the file will never be written. It also hase no side effect if the file cannot be written (will be ignored).

The main problem is: Each server registration process will flush the ualds.ini. And each registration is processed by a thread-pool. If several servers register at the same time, the file gets corrupt, because the writing is not synchronized. Also the now implemented solution will not solve the problem... there might also the the case, that both files (ini and bak) are not corrupt... so the correct solution would be: synchonize the call to the write operation.

But the even better solution would be: Never write to a configuration file... I really do not see a Need to store the registered servers...

I created a pull-request which adds an optional config entry to disable the writing of the ualds.ini: #38

erhardgrishaber commented 6 years ago

@JochenKalmbach You are partially correct. The UaServer do register periodically to the LDS (the time interval depends on the UaServers configuration), so if nothing is written into the config file, it will get to the same state eventually. According to @entitification it was reproduced with LDS version 1.02, and this was single threaded. Also the syncronization (in the current multhitread use) is assured by the OpcUa_Mutex_Lock(g_mutex); OpcUa_Mutex_Unlock(g_mutex); If someone can identify a usecase where this mutex is not enough, please let me know. The pull request is welcome and it will be discussed.

JochenKalmbach commented 6 years ago

@erhardgrishaber Hi Erhard, yes you are correct... The writing should be secured with the Mutex. Only if the LDS exits at the same time, there might be a change to write it from two different threads (so it might happen if you shutdown the computer; each server will re-register after a couple of minutes and it might happen, that it re-registers and the LDS is shut down... then we have a corrupt file). From my point of view, we should have a separate Mutex/CriticalSection in the “ualds_settings_flush” and NOT relay on the mutex of the RegisterServer. Even then, it would be great if we could disable the writing into the cfg…

erhardgrishaber commented 6 years ago

@entitification Just to clarify: Does it happen only at shutdown or also at normal running of the LDS ? If it is the second case, that the flush might not be the problem, but the internal reprezentation (in memory) of the configuration (that is to be flushed to disk) gets corrupt; and in this case preventing the write (as in PR #38 ) will not help.

JochenKalmbach commented 6 years ago

From my experience: The LDS only gets a problem, after a restart... so if the LDS was stopped and restarted, the ini file is corrupt and the LDS cannot start... so if the ini file is never changed, the LDS can start again and the problem is solved... so #38 will fix at least the issue that the LDS does not start...

erhardgrishaber commented 6 years ago

@JochenKalmbach With the latest release (1.03.371), a failsafe machanism was added that checks the config file at startup and if it is corrup/empty/missing, it will create default settings to assure that the LDS can start. If this is not happening on you're computer, can you give me a config file that is preventing the LDS to start ?

JochenKalmbach commented 6 years ago

@erhardgrishaber If the LDS creates a default configuration, it is even worse... we must have special setting in the configuration, that we can use the LDS at all; like file logging restrictions... by default the LDS will spam the whole drive without restriction... if the LDS silently uses a different configuration it will create other problems. From my point of view: if we never ever write to the configuration file, we will always be able to read it; no need to bak-files or default settings. Also we had some setup issues with the newest version, so we still use an older one; we are still investigating these.

erhardgrishaber commented 6 years ago

@JochenKalmbach 1) The requirement is: regardless of the state of the config file, the LDS must be able to start. I had situations where people manually edited the ini file and then complained the the LDS will not start. 2) It will not spam the whole drive: default settings are: LogLevel=error; LogFileSize=100; LogRotateCount=2;

JochenKalmbach commented 6 years ago

@erhardgrishaber : Sound good ;) It would be great if we could store a default configuration file in the same folder as the current one (like "uals.ini.def"). If the current one is not readable we should read the def-file; only if this file also has problems, we should create a "default" file in code... But nevertheless: It would be great, if we could optionally prevent from writing to the file at all. This would sole all current problems (beside manually editing) with the configuration file.

erhardgrishaber commented 6 years ago

The configuration file can be set to read only mode, so there is no write that corrupts the file at shutdown. ReadOnlyCfgs = 1 under the "General" tag. Default value (if setting not available) is readonly (> 0) Please use LDS release 1.03.400.