Optional model training 🍭

[ ] Calculate cardinality dynamiccally in Classifier
[ ] Improve gpu utilization of TabTransformer
[ ] Try out moving window approach e. g., for CatBoost
[ ] Get PyTorch 2.0 running once stable. See here.
[ ] Look into callbacks of PyTorch Lightning / huggingface transformers
[ ] Set-up PyTorch profiler as shown here
[ ] Try out other samplers beside TPESampler like RandomSampler. See how it affects the results
[ ] Look into implementing transformers using xformers
[ ] Let torch.tensors and np.array share memory. Could be useful for pre-training. See here.
- [ ] Add decay hyperparameter for gradient boosting / transformer
[ ] Run feature selection in CatBoost. See here.. Sample could be fine.
[ ] Try out target enconding

[ ] Add frequency encoding

# found at: https://www.kaggle.com/competitions/ieee-fraud-detection/discussion/108575
temp = df['card1'].value_counts().to_dict()
df['card1_counts'] = df['card1'].map(temp)

[ ] Study feature interactions:

feature_interaction = [[X.columns[interaction[0]], X.columns[interaction[1]], interaction[2]] for i,interaction in interactions.iterrows()]
feature_interaction_df = pd.DataFrame(feature_interaction, columns=['feature1', 'feature2', 'interaction_strength'])
feature_interaction_df.head(10)

[ ] Compare different feature scaling e. g., normalization, z-score-normalization, robust scaler, quantile transformer etc. approaches for neural net. See here. and here.
[ ] Study effects of quantization. How can one assist quantization with feature engineering? For technical background see here. and here.
[ ] Try out nicer progress bar. See here.

KarelZe / thesis

Optional model training 🍭 #97