chrsmrrs / tudataset

92 stars 16 forks source link

Do you know how to write my own dataset or use TUDataset to process it into a trainable data format? #1

Closed Shengyuan-Cai closed 3 years ago

Shengyuan-Cai commented 3 years ago

Hello blogger, I have made a preprocessing data set by myself and want to use TG for training. Do you know how to write your own dataset or use TUDataset to process it into a trainable data format? (72 graphs, 63 nodes each, with 2300 features for each node that is not hot-coded)>>>

chrsmrrs commented 3 years ago

What is TG?

Shengyuan-Cai commented 3 years ago

What is TG?

torch_geometric

chrsmrrs commented 3 years ago

Have a look at the TG documentation.

Shengyuan-Cai commented 3 years ago

Have a look at the TG documentation.

This is the demo from the torch_geometric documentation,but I don;t know the details about the method,could you give me some tips?

import torch from torch_geometric.data import InMemoryDataset, download_url

class MyOwnDataset(InMemoryDataset): def init(self, root, transform=None, pre_transform=None): super(MyOwnDataset, self).init(root, transform, pre_transform) self.data, self.slices = torch.load(self.processed_paths[0])

  @property
  def raw_file_names(self):
      return ['some_file_1', 'some_file_2', ...]

  @property
  def processed_file_names(self):
      return ['data.pt']

  def download(self):
      # Download to `self.raw_dir`.
      download_url(url, self.raw_dir)
      ...

  def process(self):
      # Read data into huge `Data` list.
      data_list = [...]

      if self.pre_filter is not None:
          data_list = [data for data in data_list if self.pre_filter(data)]

      if self.pre_transform is not None:
          data_list = [self.pre_transform(data) for data in data_list]

      data, slices = self.collate(data_list)
      torch.save((data, slices), self.processed_paths[0])
chrsmrrs commented 3 years ago

Have a look at https://github.com/chrsmrrs/sparsewl/blob/master/neural_higher_order/ZINC/gnn_1_10K.py.

This is not a support forum for TG.

Shengyuan-Cai commented 3 years ago

Have a look at https://github.com/chrsmrrs/sparsewl/blob/master/neural_higher_order/ZINC/gnn_1_10K.py.

This is not a support forum for TG. Thank you for your answer, but I sincerely hope you can help me again, thank you very much!

Analogous to the TUDataset data set, I have now understood and made the original format consistent with TUDataset, how should I write the class code ???(for processing the data set so that it can be pytorch_geometric) I have prepared the raw data like this:

(1) XX_A.txt (m lines) sparse (block diagonal) adjacency matrix for all graphs, each line corresponds to (row, col) resp. (node_id, node_id)

(2) XX_graph_indicator.txt (n lines) column vector of graph identifiers for all nodes of all graphs, the value in the i-th line is the graph_id of the node with node_id i

(3) XX_graph_labels.txt (N lines) class labels for all graphs in the dataset, the value in the i-th line is the class label of the graph with graph_id i

(4) XX_node_attributes.txt (n lines) matrix of node attributes, the comma seperated values ​​in the i-th line is the attribute vector of the node with node_id i