ADLINK-IST / opensplice

This is the Vortex OpenSplice Community Edition source repository. For our commercial offering see
https://www.adlinktech.com/en/vortex-opensplice-data-distribution-service
Apache License 2.0
259 stars 158 forks source link

The periodic disappearance of the file with historical data from pstore #84

Open WOLFY888 opened 5 years ago

WOLFY888 commented 5 years ago

When using the durability service revealed some feature of its work. The experiment is organized as follows– one process running the DataReader, the other – for send each instance of data run the DataWriter. At random moments of time DataReader finishes its work, after some time It starts again in a new process and gets the data that was received on this computer earlier and was written by the durability service to the file pstore//.xml. Sometimes, for unknown reasons, after the DataReader accumulates about 10-15 instances of data, shuts down and starts again, It stops accepting this data from the durability service. The file pstore/ / .xml disappears from the file system and all instances of the data stored in it are of course no longer available. This behavior is observed in both local readers and readers located on other computers on the network. Topic QoS, DataWriter QoS and DataReader Qos, as well as the contents of the ospl.conf file is attached ospl_content qos_topic_reader qos_topic_writer

hansvanthag commented 5 years ago

I suspect you're seeing an 'artefact' of (trying to exploit) durability in a 'sensible way' i.c.w. 'single-process' deployment of applications (which is the only supported mode in the community edition).

The concept of 'durability' basically relies on having 1 or more durability-services in your system that provide late-joiners with historical data (such as persistent-data that was stored on disk by these durability services). And also 'normally' these replicated durability-services run as part of one-or-more federations on one-or-more machines.

So what to do if you don't have federated deployment with 'selective' durability-services running? What you can do is have a 'dedicated' single-process application that only creates a domainParticipant but nothing more and have that application's configuration specify that its 'built-in durability-service') should maintain persistent data on disk. Of course you can have multiple of these running on multiple machines and they will keep each other aligned (automatically). So as long as one of these is running, it will provide historical data to late-joining applications (which don't have persistence configured for their 'built-in' durability service).

What you're seeing is likely the 'alignment' between the 'built-in-durability-services' of all your applications (of which some-or-all have persistence configured and where this alignment process implies that IF you're not the first started (application with bundled-)durabilityservice, you (i.e. your built-in durability-service) MUST 'commit' to the data-set thats already maintained in the system and provided by a transparently/dynamically chosen 'master' durability-service. So if that 'master' happens to be a durability-service that hasn't got persistence configured (and there wasn't any such persistent data published during that 'session'), it will result in the reader's durability-service removing its 'pstore' (as thats apparently data from an 'earlier-run' of the system).

So that's why the advice is to run one (or more) 'dedicated' app's who's built-in durability-service maintain the persistent data and keep them running over the system's life time. Then when you restart the system (and these 'dedicated' durability-app's), the data will be re-injected in the system (by one of these 'dedicated' app's, dynamically chosen w.r.t. 'who has the best-quality data', typically the most recent set)

WOLFY888 commented 5 years ago

@hansvanthag

have that application's configuration specify that its 'built-in durability-service') should maintain persistent data on disk

What does "application configuration"mean? As far as I understand, the application reads the configuration file at startup and, having created only domainParticipant, has no other options to configure the stability service. Thus, the settings of the resiliency service will be the same for all applications

hansvanthag commented 5 years ago

Sorry for not being clear enough .. Each/any application reads 'its' configuration as specified in it's environment: as pointed to by the OSPL_URI environment variable. So each application can exploit a different configuration, so what I was suggesting is that there's a 'dedicated' application (that only creates a domainParticipant) that has persistency configured in 'its' configuration (so 'its' OSPL_URI environment variable should point to an appropriate configuration), whereas 'regular' applications (that 'capture' the application/system's business-logic) exploit a configuration (pointed to by 'their' OSPL_URI environment variable) that points to a (likely 'default') configuration that configures a durability-service that doesn't private persistent stores but instead relies on this one-or-more 'dedicated' applications. PS> the 'term' is 'durability-service' so not 'stability service' nor 'resilience service' :)

WOLFY888 commented 5 years ago

@hansvanthag

the 'term' is 'durability-service' so not 'stability service' nor 'resilience service' :)

Sorry, the term has changed in translate)). Of course "durability-service".

So if that 'master' happens to be a durability-service that hasn't got persistence configured

All our durability services are configured with variable OpenSplice/DurabilityService/NameSpaces/Policy[@durability] = “Persistent” (from the same ospl.xml). Writers, readers, and theme also have DurabilityQosPolicyKind = PERSISTENT_DURABILITY_QOS setting. Therefore, we believe we cannot have an instance of a durability service configured differently. If we run a continuous process with domainParticipant with the same durability service settings that we are using now, will that be enough to solve the problem? I'm asking because we still haven't been able to come up with a scenario that's guaranteed to cause the data to disappear. This problem does not always occur

hansvanthag commented 5 years ago

Well .. that's what I would do: only have 1 (or perhaps a few) 'dedicated' persistence-app's that are always running and let the 'normal' (late-joining) app's get their historical data from these dedicated app's