SeldonIO / seldon-server

Machine Learning Platform and Recommendation Engine built on Kubernetes
https://www.seldon.io/
Apache License 2.0
1.47k stars 300 forks source link

Import historic actions, and add to database. #38

Open thesuperzapper opened 7 years ago

thesuperzapper commented 7 years ago

Is there a sensible way to import large amounts of historic actions?

Using seldon-cli import --action actions--client-name CLIENT_NAME --file-path PATH_TO_FILE imports them in some strange way that only Spark jobs can see.

ukclivecox commented 7 years ago

You are free to place existing data anywhere the Kubernetes clusters can get access to. If you want this data to be usable by the existing Spark jobs then it should respect the JSON format for actions or events. Also you should place the data in folders that mimic that required by the Spark jobs proj/year/month/day/data.

thesuperzapper commented 7 years ago

Yea, but this dose not import them into the database. That is, you can't request a specific user's actions from the /users/{userId}/actions API endpoint.

ukclivecox commented 7 years ago

The server does not store the raw actions into the relational db (MySQL) for scalability reasons. By default actions are stored into MemCache so that only recent activity is available. As an alternative you can use Redis (http://docs.seldon.io/configuration.html#redis ) to get permanent access to user actions.

At the same time actions are sent via FluentD to permanent storage for use in model building. So it depends what use case you want for the actions - model building or real time access via the API or runtime scoring.

thesuperzapper commented 7 years ago

I have 2 followup questions:

  1. As far as I can tell, seldon-cli import --action actions ..., dose not import actions into Redis/Memcached, just into static json files. If this is correct, is there a way to bulk import actions so that they could be returned by the REST API endpoint, /users/{userId}/actions, implemented here.
  2. Are you saying that after enabling Redis, as described here, you must follow the steps described here for it to work?