Input Data - Githubissues

Hi Dan!

Thank you for your interest! For the cancer genomics dataset, the input is basically a matrix with rows corresponding to samples (or patients) and columns corresponding to features. Suppose this matrix has 100 rows (corresponding to 100 patients) and 2000 columns (corresponding to 2000 genomic features). Each patient has its own disease subtype (i.e., class). But we only know 10 patients' disease subtypes. The input would be still the entire 100*2000 matrix. But you only backpropagate the error for the 10 patients who have true labels. We are using both labeled and unlabeled data during training. Additionally, if you know similarities among the 100 patients already, you can include that similarity graph (100x100 matrix) as input as well.

For images, the features are oriented in 2D. Because I used fully connected layers for feature transformations in this repository, it probably is not good to handle image inputs. For images, you can replace the fully connected layers with convolutional layers. The rest is similar. Or you can use CNN to extract some latent one-dimensional features for each image first, and then apply kNN attention pooling for a 2D matrix input. Of course, you can still train it end-to-end.

BeautyOfWeb / AffinityNet

Input Data #2