Open thesuperzapper opened 7 years ago
You are free to place existing data anywhere the Kubernetes clusters can get access to. If you want this data to be usable by the existing Spark jobs then it should respect the JSON format for actions or events. Also you should place the data in folders that mimic that required by the Spark jobs proj/year/month/day/data.
Yea, but this dose not import them into the database. That is, you can't request a specific user's actions from the /users/{userId}/actions
API endpoint.
The server does not store the raw actions into the relational db (MySQL) for scalability reasons. By default actions are stored into MemCache so that only recent activity is available. As an alternative you can use Redis (http://docs.seldon.io/configuration.html#redis ) to get permanent access to user actions.
At the same time actions are sent via FluentD to permanent storage for use in model building. So it depends what use case you want for the actions - model building or real time access via the API or runtime scoring.
I have 2 followup questions:
seldon-cli import --action actions ...
, dose not import actions into Redis/Memcached, just into static json files. If this is correct, is there a way to bulk import actions so that they could be returned by the REST API endpoint, /users/{userId}/actions
, implemented here.
Is there a sensible way to import large amounts of historic actions?
Using
seldon-cli import --action actions--client-name CLIENT_NAME --file-path PATH_TO_FILE
imports them in some strange way that only Spark jobs can see.