Closed a-torrano-m closed 3 years ago
Hi a-torrano-m, could you please share training job hyperparameters as well?
Hi yatasho, here they are: fm.set_hyperparameters(feature_dim =numFeatures, predictor_type ='binary_classifier', mini_batch_size =1000, num_factors =64, epochs =100)
Could this hyperparameters be tested? Is the error reproducible?
thanks
Could some reason be found for the issue?
thanks
Hi a-torrano-m, the error is reproducible. We will work on a fix. Thanks for reporting the issue.
Thanks yatasho! have you produced some "jira-ticket" or issue code we could read to follow up how is it advancing? otherwise, we will wait the news in this thread if you send any message. thanks very much!
Reference: SMAlgo-314
Please fill out the form below.
System Information
Describe the problem
We are aiming to produce recommendations using sagemaker with factorization machines. We feed the model with a sparse matrix of 45000 rows and 15000 columns. Training completes successfully. The batch transformation stage crashes during the wait(), the exception redirects to read the logs. The message is : “Unable to get response from algorithm.”
Minimal repro / logs
Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
EXCEPTION OUTPUT:
ValueError Traceback (most recent call last)