actionml / universal-recommender

Highly configurable recommender based on PredictionIO and Mahout's Correlated Cross-Occurrence algorithm
http://actionml.com/universal-recommender
Apache License 2.0
667 stars 173 forks source link

Is URAlgorithm class thread safe ? #44

Open Normal opened 6 years ago

Normal commented 6 years ago

I always thought that algorithm object is shared between threads in processing predict queries. Does this predict method's code valid ?

queryEventNames = query.eventNames.getOrElse(modelEventNames) // eventNames in query take precedence
    val (queryStr, blacklist) = buildQuery(ap, query, rankingFieldNames)
    val searchHitsOpt = EsClient.search(queryStr, esIndex, queryEventNames)

Is queryEventNames published properly? Isn't it a data race ?

pferrel commented 6 years ago

Not sure of your question.

The code above, is only getting values derived from the config file usually called engine.json. These are stored in Elasticsearch when the UR Engine is built and so are immutable unless another build happens. I see to possibility of a race condition if you follow the typical workflow of build, train, deploy. Build and Train and Deploy happen in sequential processes and the config file is read once then the data is pulled from the metadata store in Elasticsearch.

BTW please leave questions in https://groups.google.com/forum/#!forum/actionml-user so others can have the benefit of the answer

On Jan 11, 2018, at 10:12 AM, George Yarish notifications@github.com wrote:

I always thought that algorithm object is shared between threads in processing predict queries. Does this predict method's code valid ?

queryEventNames = query.eventNames.getOrElse(modelEventNames) // eventNames in query take precedence val (queryStr, blacklist) = buildQuery(ap, query, rankingFieldNames) val searchHitsOpt = EsClient.search(queryStr, esIndex, queryEventNames) Is queryEventNames published properly? Isn't it a data race ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/actionml/universal-recommender/issues/44, or mute the thread https://github.com/notifications/unsubscribe-auth/AAT8SyqNZMpzFF5Y-NbSyEAZW5XDnrjmks5tJk8lgaJpZM4RbPG-.

Normal commented 6 years ago

Let me clarify the question a little bit. I'm talking about the processing of predict queries, so I assume the engine already was built, trained and deployed. That means queryEventNames was already initialized (from engine.json). So there is no any problem yet.

But inside predict method as I see, queryEventNames reassigning happens with some query data (it is a var, not a val)

// this is inside predict method
queryEventNames = query.eventNames.getOrElse(modelEventNames)

So I guess it might be a problem because predict queries processed in parallel. Isn't it ?

BTW please leave questions in https://groups.google.com/forum/#!forum/actionml-user so others can have the benefit of the answer

I will, thank you!

pferrel commented 6 years ago

Oh, I see. Yes, you are correct but this is a test API used when you want to query only a subset of the model for cross-validation tests and I do not believe it has been documented outside of the code. It’s a little bit of a cheat and we use it in a hyperparameter search tool we have. We do this to avoid the fact that engine.json is immutable per train operation. Passing in the Event models to in the query is not a generally supported feature, normally a user would put the events to use in the engine.json and not pass them in with the query. When we do this with the hyperameter search tool they are the same for all queries of a test pass.

On Jan 11, 2018, at 2:00 PM, George Yarish notifications@github.com wrote:

Let me clarify the question a little bit. I'm talking about the processing of predict queries, so I assume the engine already was built, trained and deployed. That means queryEventNames was already initialized (from engine.json). So there is no any problem yet.

But inside predict method as I see, queryEventNames reassigning happens with some query data (it is a var, not a val)

// this is inside predict method queryEventNames = query.eventNames.getOrElse(modelEventNames) So I guess it might be a problem because predict queries processed in parallel. Isn't it ?

BTW please leave questions in https://groups.google.com/forum/#!forum/actionml-user https://groups.google.com/forum/#!forum/actionml-user so others can have the benefit of the answer

I will, thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/actionml/universal-recommender/issues/44#issuecomment-357075208, or mute the thread https://github.com/notifications/unsubscribe-auth/AAT8SwkomNYwFneSmI-NTSs1DckWoO-zks5tJoSKgaJpZM4RbPG-.

Normal commented 6 years ago

Now I've got it, thank you! I was just wondering why did that was implemented that way since queryEventNames can be converted to local variable (does not used out of predict method) , I'm not going to use that feature anyway, just learning.