[ ] Try out other samplers beside TPESampler like RandomSampler. See how it affects the results
[ ] Look into implementing transformers using xformers
[ ] Let torch.tensors and np.array share memory. Could be useful for pre-training. See here.
[ ] Add decay hyperparameter for gradient boosting / transformer
[ ] Run feature selection in CatBoost. See here.. Sample could be fine.
[ ] Try out target enconding
[ ] Add frequency encoding
# found at: https://www.kaggle.com/competitions/ieee-fraud-detection/discussion/108575
temp = df['card1'].value_counts().to_dict()
df['card1_counts'] = df['card1'].map(temp)
[ ] Study feature interactions:
feature_interaction = [[X.columns[interaction[0]], X.columns[interaction[1]], interaction[2]] for i,interaction in interactions.iterrows()]
feature_interaction_df = pd.DataFrame(feature_interaction, columns=['feature1', 'feature2', 'interaction_strength'])
feature_interaction_df.head(10)
[ ] Compare different feature scaling e. g., normalization, z-score-normalization, robust scaler, quantile transformer etc. approaches for neural net. See here. and here.
[ ] Study effects of quantization. How can one assist quantization with feature engineering? For technical background see here. and here.
TabTransformer
TPESampler
likeRandomSampler
. See how it affects the resultstorch.tensors
andnp.array
share memory. Could be useful for pre-training. See here.