Logistic regression in Sagemaker

harrykrish commented 6 years ago

I am trying to implement logistic regression in AWS Sagemaker. Based on my research I found that Linear Learner built into Sagemaker can be used to perform logistic regression when the predictor_type is set to 'binary-classification'. The implementation example in AWS sagemaker is using MNIST dataset. In my application, I have a csv and I am interested in getting the probability of an event occurring. (eg. 75% yes or 25% no).

I am trying to use the below example as reference: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_applying_machine_learning/breast_cancer_prediction/Breast%20Cancer%20Prediction.ipynb

I have the following questions:

Can I use a similar implementation done here with the exception of altering the predictor_type?
In my final prediction, I am interested in predicting %yes based on two features. Eg. I have toally 10 features 1,2,3..10. I am interested in predicting a %yes if I have feature1='some x value' and feature2='some y value'. Can this be done using implementation in sagemaker.
I am also interested in exploring an option in neural networks in sagemaker. Are there any built in libraries that can be used to build neural networks for such classification purposes?

ChoiByungWook commented 6 years ago

Hello,

I am not a experienced machine learning expert, so anyone reading this please feel free to correct me or participate in the conversation.

Based on https://en.wikipedia.org/wiki/Logistic_regression, I believe that binary-classification is the correct predictor to use. I am not too sure if the SageMaker Linear Learner binary classifier will give a percentage that a prediction belongs to a class, rather than just a label.

https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html

1. Can I use a similar implementation done here with the exception of altering the predictor_type?

I am not entirely sure. But I believe it should work to a certain extent, as it seems the hyperparameters specified within the example are common between those predictor types.

Data formats probably don't need to change for the predictor type, so the way the data is structured within the notebook should be reusable.

2. In my final prediction, I am interested in predicting %yes based on two features. Eg. I have toally 10 features 1,2,3..10. I am interested in predicting a %yes if I have feature1='some x value' and feature2='some y value'. Can this be done using implementation in sagemaker.

This portion I am not too sure at all, but from my understand I believe the feature dimension of your inference should match with the feature dimension of your training data. So you might need to add features to match the dimensions or drop the features within the training data, however that will most likely alter your results.

3. I am also interested in exploring an option in neural networks in sagemaker. Are there any built in libraries that can be used to build neural networks for such classification purposes?

As of now, we have support for TensorFlow, MXNet, PyTorch and Chainer. I believe those frameworks should have support for building neural networks. If none of these suffice, you can also bring your own container with your framework/library of choice. For building your own container, please follow the example listed below.

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb

Feel free to open up an issue asking the same questions above to https://github.com/awslabs/amazon-sagemaker-examples, as they might be able to provide better answers.

I hope this helps!

laurenyu commented 5 years ago

Closing due to inactivity. Feel free to reopen if necessary.

aws / sagemaker-python-sdk

Logistic regression in Sagemaker #343