Add support of nested namespace for the generated code

ujos commented 4 months ago

Would be great to have an option to generate the SBE parser inside another namespace. I've checked the source code of the generator and it seems there is no way to pass a namespace delimiter (i.e bob.application.scheme). Only numbers, underscore and digits are accepted.

OleksandrKvl commented 4 months ago

Can you provide a better motivation for this? In my experience, a top-level namespace for schema parser is enough and one usually doesn't want to put anything else inside it. Like if you have my_market_schema, why would you want to have parser in my_market_schema::nested_ns? What else can you put inside my_market_schema then?

ujos commented 4 months ago

Let's assume I'm working for some company "Abc" and I'm implementing the Xyz product. Then I want to generate the parser and put it into namespace Abc::Xyz::Shema.

Then in scope of works for Abc, I'm implementing the product Jkl. Then I want to have a parser in the Abc::Jkl::Schema

OleksandrKvl commented 4 months ago

Why can't you consider generated code as a third-party library (actually, it is a third-party library, just generated on-the-fly instead of being downloaded as is)? You don't ask, for example, boost to have a custom namespace, don't you?

ujos commented 4 months ago

That is exactly the reason of why namespaces are introduced - to avoid naming conflicts. You have products A and B, both have some internal communication and for the communication you need a protocol and parser

abc::A::parser
abc::B::parser

ujos commented 4 months ago

There is a workaround to use names like abc_A_parser and abc_B_parser. I'm not sure I like it.

OleksandrKvl commented 4 months ago

OK, you didn't convince me but I'll check how hard it is to implement.

OleksandrKvl commented 4 months ago

need a protocol and parser

I don't understand why you separate them. I mean, you have a schema and its code representation (a parser) so you never interact with the original schema (XML file), only with its representation. So why do you need to have 2 different schemas (because you want different namespace for each) but with the same name in the same project?

ujos commented 4 months ago

So why do you need to have 2 different schemas (because you want different namespace for each) but with the same name in the same project?

My vision is that

Protocol defines how the information is exchanged between parties (set of messages and their fields)
SBE schema enriches the protocol with the technical information (templateIDs, encoding)
SBE Parser (C++, Java, Python, etc) implements the schema

ujos commented 4 months ago

I can give you an example.

There is the FIX Protocol. Most people think of it as a string of the tag/values. In fact they miss two things: protocol and representation (schema). FIX defines the set of messages (D, G, F). Those messages can be transferred as a string of tag/values. Also they can be transferred as FIXML. Or SBE. Or FAST. So in fact FIX protocol can have multiple schemas.

OleksandrKvl commented 4 months ago

I asked not about your vision of terminology but about a concrete example when you can have 2 different schemas with the same name within the same project. Or, in terms of your previous message, how's it possible that abc_A_parser is different from abc_B_parser?

OleksandrKvl commented 4 months ago

Anyway, there’s another question. Currently, files are generated into output_dir/schema_name/ directory, in this way, file structure nicely mimics namespace organization:

// file on disk: output_dir/schema_name/messages/msg1.hpp

// include
#include <schema_name/messages/msg1.hpp>

// usage
schema_name::messages::msg1<char> m;

Now, what should happen if we add enclosing namespace around it? Should this be reflected in file structure? If answer is "No" then, again, you can't have 2 schemas with the same name because by default their output_dir is the same, you'll need to have a directory per schema.

// file on disk: output_dir/schema_name/messages/msg1.hpp

// include
#include <schema_name/messages/msg1.hpp>

// usage, now the class name is not the same as include path
company::project::schema_name::messages::msg1<char> m;

It's also not clear if client should include <schema_name/messages/msg1.hpp> or <company::project::schema_name::messages::msg1.hpp>.

ujos commented 4 months ago

Or, in terms of your previous message, how's it possible that abc_A_parser is different from abc_B_parser?

Because inside the abc_A_parser I may need messages A1, A2 and A3, while in project B I need the messages B1, B2 and B3. That is why there are two protocols. And both of the protocols have the same name while implemented in the different namespaces, so they do not conflict,

Now, what should happen if we add enclosing namespace around it? Should this be reflected in file structure?

Of course namespaces should reflect the files structure, or opposite the file structure should reflect namespaces:

#include <abc/A/protocol/protocol.hpp>

using namespace abc::A;

auto msg = sbepp::make_view<protocol::messages::msg1>{};

OleksandrKvl commented 4 months ago

And both of the protocols have the same name while implemented in the different namespaces

But there's no such thing as "namespace" in SBE, it only has schema name. Having the same schema name for the scenario you described is simply a bad naming because these are 2 different schemas, there's no reason they should have the same name.

ujos commented 4 months ago

Maybe you are right....

What if two exchanges deploy their SBE schemas with the same package name then. In order to use the market data from the two sources in one application and to resolve the names conflict, should I use "--schema-name" parameter?

For instance CME in their schema has package=mktdata. Let's assume CBOE also use SBE and they also have package=mktdata. Then I have to provide unique names like:

cme_mktdata
cboe_mktdata

So here are the namespaces.

OleksandrKvl commented 4 months ago

should I use "--schema-name" parameter?

Yes but in my experience the only case when you need it is when schema name in XML is not a valid C++ namespace name or the name is not descriptive.

For markets the common naming usually looks like [market name][gateway type]["schema"/"messages"], for example euronext_mdg, cme_mktdata, etc. Even if it's not, like in mktdata case that you described above, it's easy to invent a proper name using this template. Note that naming a schema mktdata is clearly a mistake made by schema authors, it's like naming your project a project, just a bad name that goes against good software development practices.

Where I work, we use SBE as a main internal messaging protocol and we never needed to have a deeper namespace than schema_name. For example, we have schemas like CompanyNameMessages that's used across the projects and ProjectNameMessages for a project-specific purpose.

Of course namespaces should reflect the files structure

It's a good practice but it's not so widely used. Often people have top-level project namespace but don't have corresponding include paths, especially for internal files. Even boost has <boost/optional.hpp> but the actual implementation is located inside <boost/optional/optional.hpp> and that's not reflected by namespaces. That being said, I don't think that enforcing file structure reflection is the right approach and this should done by client manually if needed. Now, here's what the potential implementation of this feature would require:

adding new optional --enclosing-namespace parameter to the sbeppc and CMake helper
adding new parameter like INCLUDE_DIR to CMake helper because by default it passes OUTPUT_DIR to the target_include_directories() call. To have file structure reflection it would be called like:

sbeppc_compile_schema(
    TARGET_NAME compiled_schemas
    SCHEMA_FILE "schema1.xml"
    ENCLOSING_NAMESPACE "company::project"
    OUTPUT_DIR "${CMAKE_BINARY_DIR}/sbeppc_generated/company/project"
    INCLUDE_DIR "${CMAKE_BINARY_DIR}/sbeppc_generated"
)

or, instead of INCLUDE_DIR, sbeppc can have yet another, bool-like, option --create-enclosing-namespace-dir. This is probably a better way to do it. But anyway, I don't like the fact that now a single feature is controlled by two options.

At the moment, I don't think all this additional complexity really worth the effort because there's no real-world problem it solves. I'm closing the issue until you (or anyone else) provide a real case with real names to justify the implementation.

OleksandrKvl / sbepp

Add support of nested namespace for the generated code #20