EarthScope / ringserver

Apache License 2.0
30 stars 17 forks source link

Behaviour on old data arrival #42

Closed jschaeff closed 1 year ago

jschaeff commented 1 year ago

Hello,

I need to understand a weird behaviour we see in our slarchive processes.

We have a ringserver configured to serve data from an existing directory. Lets consider the following scenarii:

  1. Old data (lets say from 3 days ago) is added in the scanned repository as a new file Does ringserver streams the old data ? Will the slarchive clients get the data eventually ?

  2. Old data (lets say from 3 days ago) is added in the scanned repository in an existing file Does ringserver streams the old data ? Will the slarchive clients get the data ?

Thanks for your insights !

chad-earthscope commented 1 year ago

Hi @jschaeff.

First, a clarification for context: ringserver does not serve data from the file system. Data are only served from the internal buffer, which is a First In First Out (FIFO) style buffer. Data scanned from the file system with the built-in scanning facility are inserted into the buffer as the files are discovered. It does not matter the data-time of the records, or time or name of the files, records are simply inserted as they are discovered. The scanner keeps track of files that it has read, so if the files grow (are appended to) after they are read, the new data will be read when the scanner checks them in the next pass of the scanner.

By default all files in the specified base directory, and sub-directories, are read. Which files are read can be controlled with the Match and Reject sub-options. In special cases existing files may be skipped depending on the setting of InitCurrentState and StateFile.

Old data (lets say from 3 days ago) is added in the scanned repository as a new file Does ringserver streams the old data ?

Yes.

Will the slarchive clients get the data eventually ?

Yes, it should, depending on selections

Old data (lets say from 3 days ago) is added in the scanned repository in an existing file Does ringserver streams the old data ?

Yes.

Will the slarchive clients get the data ?

Yes, it should, depending on selections

Fundamentally, ringserver does not much care about the time of the data, either in files that it scans or the buffer used to stream data to clients. Data is inserted into the buffer (via scanning or a client connection) and if clients have a matching subscription it will be transmitted to them. The only exceptions are that it can a) report the extents, earliest and latest, for similarly named data (streams/channels) and b) filter/skip data based on data time for "time window" requests.

chad-earthscope commented 1 year ago

Oh, but now I remember another characteristic regarding modification time of files being scanned using the build-in scanner. A description is in the code as:

  • The file scanning logic employs the concept of active, idle and
  • quiet files. Files that have not been modified for specified
  • amounts of time are considered idle or quiet. The idle files will
  • not be checked on every scan, instead a specified number of scans
  • will be skipped for each idle file. The quiet files will never be
  • checked ever again.

Files are considered idle if not modified in the last 7200 seconds (2 hours), and then are scanned less often for efficiency. Perusing the code, I do not see that the classification of quiet is ever used in ringserver, this option may be exposed in a future release.

jschaeff commented 1 year ago

Thank you for those explanations.

Cheers