BxCppDev / Bayeux

Core Persistency, Geometry and Data Processing C++ Library for Particle and Nuclear Physics Experiments
GNU General Public License v3.0
4 stars 9 forks source link

Allow properties/multi_properties to be created from non-filesystem sources #68

Open drbenmorgan opened 4 years ago

drbenmorgan commented 4 years ago

A use case identified in SuperNEMO's use of Bayeux is the ability to store properties/multi_properties (I'll just use properties from now on to refer to both for simplicity) in non-filesystem based stores like databases. As implemented, properties and the underlying Bayeux kernel and associated tools always assume that paths (where to find the data), and data (the content to be read) are files on the local filesystem.

The issue is not directly with the lowest level interfaces such properties::config::_read_, which use istream as the data source:

https://github.com/BxCppDev/Bayeux/blob/b9395b40be0b6382d9fd4a57d4eca65866bf3eab/source/bxdatatools/src/properties.cc#L3191

so have no hard requirement to use ifstream/FILE (for example). However the higher level read/write interfaces do make this assumption:

https://github.com/BxCppDev/Bayeux/blob/b9395b40be0b6382d9fd4a57d4eca65866bf3eab/source/bxdatatools/src/properties.cc#L3142-L3160

as do the helper classes like file_include:

https://github.com/BxCppDev/Bayeux/blob/b9395b40be0b6382d9fd4a57d4eca65866bf3eab/source/bxdatatools/src/properties.cc#L3621-L3629

The problem is thus to try and hide this assumption of paths representing files on the filesystem, and that the data/content will come as an ifstream. Pull Requests #65, #66, and #67 provide a first step for investigating this by decoupling the reading/writing of properties from the in-memory representation. That's generally useful, but the key thing here is that across Bayeux it reduces the construction of properties from a file to a canonical form:

datatools::fetch_path_with_env(someFilePath);
datatools::properties someProps;
datatools::read_config(someFilePath, someProps);

I think what's needed is a "content_resolver" object that would take a path and return the content at that path:

class content_resolver {
...
  content_type get(path_type path) const;
};

with a datatools::read_config implementation then looking something like:

namespace datatools {
  void read_config(path_type& path, content_resolver& resolver, properties& props)
  {
    properties_config reader(resolver); // hands down resolver to other things that need it
    reader.read(path, props); // populate as needed
  }
}

and properties_config (as shown in #67) could then implement read with little change as:

void properties_config::read(path_type& path, properties& props)
{
  content_type cont = resolver.get(path); // uses the constructed with content_resolver
  // make an istream from cont, e.g. istringstream if it's string or similar  
  this->_read_(cont, props); // reuse existing implementation!
}

Different implementations of the resolver would handle any needed resolution mechanism, e.g. local filesystem, remote web, Git, SQL etc. They could also be organised in a PATH like structure to allow "overlays", e.g use database, but overlay local filesystem to allow testing. The content_type could be as simple as std::string for properties as these are always text data. A basic example of a Git backend is shown in https://github.com/SuperNEMO-DBD/SNGitCondDB, but it can be seen from that that extension to, e.g. SQL or similar would be straightforward.

There's quite a bit of inspiration here from Fermilab's cetlib and fhicl-cpp libraries, from which code could be used if needed. See

This is just a quick sketch though, so wanted to raise the issue to start discussion on design/implementation, and bring in @robobre, @pfranchini, @lemiere, @emchauve, @cherylepatrick from the SuperNEMO AB as this will impact/benefit the experiment more broadly. There're also more locations through Bayeux to consider, so obviously needs discussion and thought on the design.