Closed AliTajeldin closed 5 years ago
@ninjapapa @laneb please take a look at the above before I start the impl.
Couple questions:
jdbc
-> SmvJdbcConnectionInfo
)?After some discussions with @laneb and @ninjapapa , we will utilize class hierarchy and code inspection to allow users to declare providers of various types rather than use the decorator pattern.
Each provider class in the hierarchy must provide a provider_type
method that returns the provider type. The fqn of the provider type is the concatenation of all the provider types in the parent hierarchy of this provider.
Users will be able to call SmvProvider.get_providers_by_prefix()
to get all known providers with a provider type fqn that matches the prefix. For example, to get the list of all connection providers, user would call get_providers_by_prefix("conn.")
. If we are interested in only spark model providers, we would query for get_providers_by_prefix("model.spark_ml.")
get_providers_by_prefix
will return a dictionary of <fqn, provider_klass>
of all providers that match the prefix.
Source code introspection will be used to discover known providers. Since all provider classes are directly or indirectly derived from SmvProvider
, they will all have the IS_PROVIDER
marker attribute.
The following directories will be scanned for providers:
smv.providers
: parent directory of all smv defined providers.{project}/library
: all project specific providers must be defined in the project library directoryNote: To avoid dynamic loading issues, the code scan is performed every time get_providers_by_prefix
is called.
class SmvConnInfoProvider(SmvProvider):
@staticmethod
def provider_type(): return "conn"
class SmvJdbcConnInfoProvider(SmvConnInfoProvider):
@staticmethod
def provider_type(): return "jdbc"
@staticmethod
def attrs(): return [ "url", "driver", ... ]
Note: the provider fqn for SmvJdbcConnInfoProvider
will be conn.jdbc
.
The provider api is just for the discovery of classes. It doesn't have any semantic meaning of the classes it finds. In the example above, it is the responsibility of the SmvConnInfoProvider
to understand that connection classes provide a static attrs
method for determining the config parameters of the connection.
In the case of model discovery, the discovered class is just another SmvGenericModule
or some derivative thereof. All the normal run
, requiresDS
and such would be provided by derived classes.
@ninjapapa @laneb ready for re-review.
depends on #1504
LGTM. Just need to add some sample client code.
@ninjapapa added example code @laneb :
@AliTajeldin will the jdbc attributes (such as url, driver, etc.) be attributes of SmvJdbcConnInfoProvider
or a sub class of it?
@ninjapapa I updated the "example providers" section to make the example a bit clearer. In summary, the provider interface doesn't know anything about the classes it finds, it is only a discovery api. It is up to the discovered classes to implement whatever additional api they need.
In the case of connection providers, they will use the attrs()
method to find out what attributes they need to read from config.
In the case of model classes, they are just another SmvGenericModule
that the user can create derived classes (or generated by tool) based on attributes or info provided by the provider.
Let's say we may introduce a method call list_data_in_conn
. Where should that method belongs? More specific, should SmvJdbcConnInfoProvider
implement that method?
Assuming list_data_in_conn
has connection info specific knowledge, then, yes, it has to be implemented by SmvJdbcConnInfoProvider
. It can either be static or instance method depending on what information it depends on.
Sounds good.
Only the naming convention may still have some room to discuss. On connection side, since ConnInfo
s are relatively simple, the direct user interface class (e.g. SmvJdbcConnInfoProvider
) named as "Provider" is ok. However on the module side, it will be confusing to have name the user interface classes as provider. If we keep some in old convention, and some called "Provider", it will be more confusing. Suggest to make "provider" only be the name of the mixins, and user interface classes always have no "provider" postfix.
@ninjapapa makes sense 👍
Problem
We have run into multiple issues now with solutions that require a way to "register" a set of classes that provide a type of service and then query for the list of all known such classes. For example, for connection classes, we would like to know all the known available connection classes so we can map the user specified type to the actual class type.
Solution
Need to create a lightweight solution (so no distributed consul, zookeeper, or other large scale external solutions) that is embedded within SMV. The proposal is to use python decorators (https://realpython.com/primer-on-python-decorators/#decorating-classes) to allow users to easily declare providers and their associated metadata.
Decorator
SMV will provide the
SmvProvider
class decorator that users can utilize to declare their provider classes/interfaces. For example:The
SmvProvider
takes two and only two arguments. The type of the service being provided and the name of this provider. In the example above, the type isconn
and the name isjdbc
. Users would be able to ask SMV for list of all knownconn
providers and so on.Note: for the registration to work, the above code must be "executed". The enclosing python module must be imported someplace. The easiest way to do this is to enumerate all the provider files into a single "index" file and import the index file so force the import of all the provider files. SMV will do that automatically for all its known providers (e.g.
SmvHdfsDirConnection
)Metadata
Rather than burden the provider registry with metadata, we should create a base provider class for each known provider type and declare abstract methods in the base class to force the concrete classes to provide the required metadata. The provider interface should be kept simple and minimal.
For example, assume we have a provider type X that needs a foo attribute for each concrete provider. Instead of adding extra
foo
parameter to theSmvProvider
decorator, we should create aBaseX
class that declaresfoo()
method as abstract. So: WRONGCORRECT
TBD
sys.modules
cache and always reloaded or should we assume they don't change.