Closed tdipisa closed 5 years ago
deleted @cezio comment by mistake:
Asynchronous mechanism to manage the incoming records and grant high availability
I'd strongly suggest to move this functionality to webhook web application, since it's the one that is responsible for flow control. Adding such dependency to library is a very bad practice.
@cezio as requirement we need to have this python module indipendent from the web application. The web application we specified in our proposal is needed only to provide a demonstration of usage of the python module (is not a real part of the required work). In addition the reason why we need to implement an asyncronous mechanims in the python module is to properly manage multiple incoming requests in a context in which each request needs to be validated to be backupped: for each incoming request we need to perform HTTP requests to the Fulcrum APIs for validation purposes, the I/O operations against the Pg DB and so on.
The goal of the project is the implementation of the backup logic. I proposed to include an asynchronous mechanism to increase the throughput of the app, but it wasn't meant to substitute or implement a distribuited mechanism like with Celery + brokers, or similar. Their initial reference was a simple, basic PHP script!
The main reason I thought about this was to release the webhook request asap, to not incur on protection mechanisms from the Fulcrum platform webhook system.
Thinking twice I agree that probably the core module shouldn't implement the async / queue concerns... I would leave this aside for the moment and eventually consider a mechanism at an upper level.
A key point is not to waste most of the time budget on this and do not invest too much on the web app (no celery, brokers, external servers, etc.)
This comment is related: https://github.com/geosolutions-it/pyfulcrum/issues/8#issuecomment-430595474
Few words on implementation I'm working on:
ApiManager
with db session (or db connection) and api key. The latter may be optional later for some uses.ApiManager
will have properties for each resource type (.projects
, .forms
, etc), with uniform api (at the moment two methods: .get(id, cached)
, list(cached)
. This is similar to fulcrum python module, but the key difference is gateway classes for resources will handle API and db-level operations internally returning db objects. Caller will have to specify if it wants update db with live results (cached
flag). Internally, if cached
is set to False
, resource gateway class will fetch data from Fulcrum API, deserialize, transform and save to db, and return db object(s). If cached
is set to True
(default), results are returned just from db.Usage example:
from sqlalchemy.engine import make_engine
from pyfulcrum.lib.api import ApiManager
from fulcrum import Fulcrum
fulcrum = Fulcrum(key='super-secret-key')
DB_URL = 'postgresql://user:pwd@host/db'
api = ApiManager(session=make_engine(DB_URL), client=fulcrum)
api.forms.list()
api.forms.get(project_id)
...
current tests and coverage:
====================================================================== test session starts ======================================================================
platform linux -- Python 3.6.6, pytest-3.9.2, py-1.7.0, pluggy-0.8.0 -- /mnt/work/cezio/geosolutions/repos/pyfulcrum/lib/venv/bin/python
cachedir: .pytest_cache
rootdir: /mnt/work/cezio/geosolutions/repos/pyfulcrum/lib, inifile: setup.cfg
plugins: cov-2.6.0
collected 7 items
src/pyfulcrum/lib/tests/test_models.py::ModelsTestCase::test_forms PASSED
src/pyfulcrum/lib/tests/test_models.py::ModelsTestCase::test_media PASSED
src/pyfulcrum/lib/tests/test_models.py::ModelsTestCase::test_projects PASSED
src/pyfulcrum/lib/tests/test_models.py::ModelsTestCase::test_records PASSED
src/pyfulcrum/lib/tests/test_storage.py::StorageTestCase::test_storage_local PASSED
src/pyfulcrum/lib/tests/test_storage.py::StorageTestCase::test_storage_save PASSED
src/pyfulcrum/lib/tests/test_storage.py::StorageTestCase::test_storage_url PASSED
----------- coverage: platform linux, python 3.6.6-final-0 -----------
Name Stmts Miss Cover
--------------------------------------------------------------
src/pyfulcrum/lib/__init__.py 2 0 100%
src/pyfulcrum/lib/api.py 176 35 80%
src/pyfulcrum/lib/cli.py 87 87 0%
src/pyfulcrum/lib/formats.py 92 76 17%
src/pyfulcrum/lib/migrations/__init__.py 0 0 100%
src/pyfulcrum/lib/migrations/env.py 22 22 0%
src/pyfulcrum/lib/models.py 256 38 85%
src/pyfulcrum/lib/storage.py 26 1 96%
src/pyfulcrum/lib/tests/__init__.py 79 14 82%
src/pyfulcrum/lib/tests/test_models.py 37 0 100%
src/pyfulcrum/lib/tests/test_storage.py 26 0 100%
--------------------------------------------------------------
TOTAL 803 273 66%
Module API is fairly simple, it's described in Readme: https://github.com/cezio/pyfulcrum/tree/master/lib#pyfulcrum-api
Develop a specific API to send Fulcrum's Records to the PyBackup module. An asynchronous mechanims needs to be included at this stage in order to grant high availability of the endpoint service, and reduce (throttle) downstream requests toward Fulcrum APIs which are needed to access and validate webhooks payloads.
Needed functionalities:
- [ ] Asynchronous mechanism to manage the incoming records and grant high availability(commented out by https://github.com/geosolutions-it/pyfulcrum/issues/3#issuecomment-431051535)