Problem: Currently there is a binary data (Yes/No question dataset) that could benefit from dimensionality reduction, and be applied to feature selection and regression. The data is ves_data.csv.zip
```python
from pandas import read_csv
table = read_csv('ves_data.csv')
total = table[[i for i in table if ('MM01' in i and i not in [
'MM01001','MM01BR','MM01003A','MM01003B',
'MM01003C','MM010567','MM010568','MM010569','MM010570','MM010571',
'MM010572','MM010573','MM010574','MM010575','MM010576','MM010577',
'MM010578','MM010579','MM010580','MM010581']) or
i in ["PA", "GIT", "AFQT", "WAIS_BD", "WAIS_GI", 'VERAW','ARRAW','VESS','ARSS']]]
total = total[[i for i in table if ('MM01' in i and i not in [
'MM01001','MM01BR','MM01003A','MM01003B',
'MM01003C','MM010567','MM010568','MM010569','MM010570','MM010571',
'MM010572','MM010573','MM010574','MM010575','MM010576','MM010577',
'MM010578','MM010579','MM010580','MM010581']) or i in ['AFQT']]].dropna() # 'GIT' is good too
from sklearn.utils import shuffle
X, y = shuffle(total.drop(['AFQT'], axis=1), total['AFQT'], random_state=13)
X = X - 1 # calibrating the range from 1~2 to 0~1
X = X.to_numpy() # needed for some code to function
```
I'm closing this because I'm not sure what there is more to say. There might indeed be better methods for handling boolean data. Feel free to contribute one if you can show it's relevant.
Problem: Currently there is a binary data (Yes/No question dataset) that could benefit from dimensionality reduction, and be applied to feature selection and regression. The data is ves_data.csv.zip
Currently there are options for doing this:
Some information to get the data started