We need to provide users of this library a way to provide their own persistency managers, so they can have custom storage implementations, this is specially important to deploy the library in angostura.
Solution
Now you can specify (using environment variables) a file containing a get_persistency_manager :: () -> BasePersistencyManager function to provide a persistency manager instance to use to retrieve data. This way, the user will have access to all the scraping, crawling and classifying architecture for their data, no matter how they store it. The user should provide the following environment variables:
PERSISTENCY_MANAGER = It tells the app which persistency manager to use, for now the available options are:
USER = user defined persistency manager
SQLITE = The default sqlite manager, this is the default option in case no value is provided for this env variable
USER_PERSISTENCY_MANAGER_PATH = Path to the python file containing the get_persistency_manager function
USER_PERSISTENCY_MANAGER_MODULE = Module name for the module in the file containing the function
Relevant files
src/c4v/microscope/metadata.py = Additional fields for the metadata configuration object, you can also provide this values as part of the json config file in the .c4v folder
src/c4v/config.py = The environment variables are parsed by dynaconf from here
sec/c4v/microscope/manager.py = Refactor in from_default factory function to set up the persistency manager properly depending on the passed configuration
Example
The Custom Manager
Suppose we have a my_db_manager.py file with the get_persistency_manager function, you can borrow this example dict manager from the docs and add the following function:
Mind the C4V_ prefix in every environment variable, they are required for every environment variable for this project
Test
you can run the following command and check that the DB is empty, as the dict is created in each run of the app:
c4v list
Also, you can run the following code to check how the new db is empty, then fill it with a bit of data, and retrieve it:
from c4v.microscope.manager import Manager
m = Manager.from_default()
print(list(m.get_all())) # empty list
m.crawl_new_urls_for(["primicia"], limit=10)
print(list(m.get_all())) # some data if everything went ok
Problem
We need to provide users of this library a way to provide their own persistency managers, so they can have custom storage implementations, this is specially important to deploy the library in angostura.
Solution
Now you can specify (using environment variables) a file containing a
get_persistency_manager :: () -> BasePersistencyManager
function to provide a persistency manager instance to use to retrieve data. This way, the user will have access to all the scraping, crawling and classifying architecture for their data, no matter how they store it. The user should provide the following environment variables:PERSISTENCY_MANAGER
= It tells the app which persistency manager to use, for now the available options are:USER
= user defined persistency managerSQLITE
= The default sqlite manager, this is the default option in case no value is provided for this env variableUSER_PERSISTENCY_MANAGER_PATH
= Path to the python file containing theget_persistency_manager
functionUSER_PERSISTENCY_MANAGER_MODULE
= Module name for the module in the file containing the functionRelevant files
src/c4v/microscope/metadata.py
= Additional fields for the metadata configuration object, you can also provide this values as part of the json config file in the.c4v
foldersrc/c4v/config.py
= The environment variables are parsed by dynaconf from heresec/c4v/microscope/manager.py
= Refactor infrom_default
factory function to set up the persistency manager properly depending on the passed configurationExample
The Custom Manager
Suppose we have a
my_db_manager.py
file with theget_persistency_manager
function, you can borrow this example dict manager from the docs and add the following function:The environment variables
Run the following code in your terminal:
Mind the
C4V_
prefix in every environment variable, they are required for every environment variable for this projectTest
you can run the following command and check that the DB is empty, as the dict is created in each run of the app:
Also, you can run the following code to check how the new db is empty, then fill it with a bit of data, and retrieve it:
Further work