alan-turing-institute / SemAIDA

Semantic Technologies for the AIDA project
Apache License 2.0
38 stars 7 forks source link

LinkedReproducibility; CSVW #2

Open westurner opened 5 years ago

westurner commented 5 years ago

A few maybe useful resources to share. Interesting paper!

westurner commented 5 years ago

https://arxiv.org/pdf/1811.01304.pdf :

Synthetic Columns In training, ColNet automatically extracts labeled samples from the KB. A training sample s .= (e, c) is composed of a synthetic column e and a class c in C, while a synthetic column is constructed by concatenating a specific number of entities.

https://westurner.github.io/hnlog/#comment-18957269 :

Featuretools https://github.com/Featuretools/featuretools

Featuretools is a python library for automated feature engineering. [using DFS: Deep Feature Synthesis]

auto-sklearn does feature selection (with e.g. PCA) in a "preprocessing" step; as well as "One-Hot encoding of categorical features, imputation of missing values and the normalization of features or samples" https://auto-ml.readthedocs.io/en/latest/deep_learning.html#feature-learning

auto_ml uses "Deep Learning [with Keras and TensorFlow] to learn features for us, and Gradient Boosting [with XGBoost] to turn those features into accurate predictions" https://automl.github.io/auto-sklearn/master/manual.html#turning-off-preprocessing

... "Ask HN: Data analysis workflow?" https://westurner.github.io/hnlog/#comment-18798244 :