csong27 / membership-inference

Code for Membership Inference Attack against Machine Learning Models (in Oakland 2017)
176 stars 63 forks source link

how to convert input_var to a string matrix, if train_feat_file contains string instead of floats? #5

Open SMJT01 opened 5 years ago

SMJT01 commented 5 years ago

Hi csong, I am new in Tensor. I wanted to try your code, but my dataset contains string data instead of floating point values. How should i modify the code in my case? could you please help? Thanks

Imathatguy commented 5 years ago

Hi SMJT01,

Your string features can be treated as Categorical Data. There are quite a few ways of encoding these string values into numerals that can then be interpreted by the ML model.

The following article provides quick descriptions about some of these various methods. https://towardsdatascience.com/smarter-ways-to-encode-categorical-data-for-machine-learning-part-1-of-3-6dca2f71b159

Two of these encoders are also implemented in the official sk-learn library: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html And other implementations can be found too. http://contrib.scikit-learn.org/categorical-encoding/

SMJT01 commented 5 years ago

Hi Imathatguy, thank you very much for your reply. Primarily I solved this issue by encoding them to a numerical dataset. But off course if I could use categorical attributes directly, that would save much of my time. as far as I explore, Thenao tensor does not have anything for string datatype. I'll try the scikit learn packages and i'll let your know whether they work well on the data or not. Thanks