code-for-venezuela / c4v-py

3 stars 3 forks source link

Luis/env based persistency manager #97

Closed LDiazN closed 2 years ago

LDiazN commented 2 years ago

Problem

We need to provide users of this library a way to provide their own persistency managers, so they can have custom storage implementations, this is specially important to deploy the library in angostura.

Solution

Now you can specify (using environment variables) a file containing a get_persistency_manager :: () -> BasePersistencyManager function to provide a persistency manager instance to use to retrieve data. This way, the user will have access to all the scraping, crawling and classifying architecture for their data, no matter how they store it. The user should provide the following environment variables:

Relevant files

Example

The Custom Manager

Suppose we have a my_db_manager.py file with the get_persistency_manager function, you can borrow this example dict manager from the docs and add the following function:

def get_persistency_manager() -> DictManager:
    return DictManager()

The environment variables

Run the following code in your terminal:

export C4V_PERSISTENCY_MANAGER=USER
export C4V_USER_PERSISTENCY_MANAGER_PATH=/path/to/my_db_manager.py
export C4V_USER_PERSISTENCY_MANAGER_MODULE=my_db_manager

Mind the C4V_ prefix in every environment variable, they are required for every environment variable for this project

Test

you can run the following command and check that the DB is empty, as the dict is created in each run of the app:

c4v list

Also, you can run the following code to check how the new db is empty, then fill it with a bit of data, and retrieve it:

from  c4v.microscope.manager import Manager
m = Manager.from_default()
print(list(m.get_all())) # empty list
m.crawl_new_urls_for(["primicia"], limit=10)
print(list(m.get_all())) # some data if everything went ok

Further work