Q: what is a good option for FA with Boolean Data?

Problem: Currently there is a binary data (Yes/No question dataset) that could benefit from dimensionality reduction, and be applied to feature selection and regression. The data is ves_data.csv.zip

Currently there are options for doing this:

Boolean Matrix Factorization (to be tested https://github.com/LifanLiang/EM_BMF
Binary Polychoric Correlation Matrix (to be tested https://github.com/inuyasha2012/pypsy
Correlation Explanation (to be tested) https://github.com/gregversteeg/CorEx
Positive Pointwise Mutual Information (note: only good for implicit data) https://github.com/Bollegala/svdmi
Logistic PCA (note: native Python implementation is memory-consuming) https://github.com/brudfors/logistic-PCA-Tipping

Some information to get the data started

```python from pandas import read_csv table = read_csv('ves_data.csv') total = table[[i for i in table if ('MM01' in i and i not in [ 'MM01001','MM01BR','MM01003A','MM01003B', 'MM01003C','MM010567','MM010568','MM010569','MM010570','MM010571', 'MM010572','MM010573','MM010574','MM010575','MM010576','MM010577', 'MM010578','MM010579','MM010580','MM010581']) or i in ["PA", "GIT", "AFQT", "WAIS_BD", "WAIS_GI", 'VERAW','ARRAW','VESS','ARSS']]] total = total[[i for i in table if ('MM01' in i and i not in [ 'MM01001','MM01BR','MM01003A','MM01003B', 'MM01003C','MM010567','MM010568','MM010569','MM010570','MM010571', 'MM010572','MM010573','MM010574','MM010575','MM010576','MM010577', 'MM010578','MM010579','MM010580','MM010581']) or i in ['AFQT']]].dropna() # 'GIT' is good too from sklearn.utils import shuffle X, y = shuffle(total.drop(['AFQT'], axis=1), total['AFQT'], random_state=13) X = X - 1 # calibrating the range from 1~2 to 0~1 X = X.to_numpy() # needed for some code to function ```

MaxHalford / prince

Q: what is a good option for FA with Boolean Data? #133