Open ParasKoundal opened 2 years ago
Hi @ParasKoundal, could you provide the code snippet on using dataloaders with graph data, so that we can take a closer look.
@xuyxu It is simple.
.....
from torch_geometric.loader import DataLoader
.......
train_loader = DataLoader(train_dataset, batch_size= batch_s, shuffle=True,drop_last=True)
val_loader = DataLoader(val_dataset, batch_size=2,drop_last=True)
test_loader = DataLoader(test_dataset, batch_size=2,drop_last=True)
.......
I have created a custom class to preprocess dataset before loading into dataloader.
After that I was trying as given in https://ensemble-pytorch.readthedocs.io/en/latest/quick_start.html. For regression I tried initially with VotingRegressor, doesn't work (error given in the initial issue raised). Similar with others too.
Could you further provide the full exception traceback, thanks!
@xuyxu
Here's that
test_loader=test_loader
File "/cr/data02/koundal/applications/gpu-project/lib/python3.7/site-packages/torchensemble/bagging.py", line 329, in fit
self.n_outputs = self._decide_n_outputs(train_loader)
File "/cr/data02/koundal/applications/gpu-project/lib/python3.7/site-packages/torchensemble/_base.py", line 267, in _decide_n_outputs
_, target = split_data_target(elem, self.device)
File "/cr/data02/koundal/applications/gpu-project/lib/python3.7/site-packages/torchensemble/utils/io.py", line 84, in split_data_target
raise ValueError(msg)
ValueError: Invalid dataloader, please check if the input dataloder is valid.
This could possibly be the side-effect of the commit from issue #75. Will see if this could be fixed in a few days, thanks for reporting @ParasKoundal !
@xuyxu Any update on this?
Hi @ParasKoundal, sorry, I am kind of busy these days, and will take a look during the next weekend.
In torchensemble, at each iteration the input loader is expected to return a list in the following forms:
[data_tensor, target_tensor]
[data_tensor_1, data_tensor_2, ..., target_tensor]
The first kind of form is the most widely-used form of the dataloader (i.e., for batch_idx, (data, target) in enumerate(loader)
), while the second one comes from the feature request from #75 to support multiple input tensors.
However, the dataloder in pytorch geometric conforms to neither of them:
(positive_batch, negative_batch)
which does not contain a target tensor since the label is simply the index of the batch in the tuple returned.
Here is a simple solution, please let me know if it solves your problem on using torchensmeble models in pytorch geometric. The general idea is to override the _sample
method. Taking metapath2vec
as an example, we could declare a new class like:
from torch_geometric.nn import MetaPath2Vec
class CustomMetaPath2Vec(MetaPath2Vec):
def _sample(self, batch: List[int]) -> Tuple[Tensor, Tensor]:
if not isinstance(batch, Tensor):
batch = torch.tensor(batch, dtype=torch.long)
pos_sample = self._pos_sample(batch)
neg_sample = self._neg_sample(batch)
data = torch.cat((pos_sample, neg_sample), dim=0)
target = torch.cat(
(torch.ones(pos_sample.size(0)), torch.zeros(neg_sample.size(0))),
dim=0
)
return [data, target]
Using this new class, the positive_batch and negative_batch will be concatenated as one tensor data
, and you can identify them via the target
tensor.
In addition, some extra steps are required in the forward
function of downstream base estimators.
Looking forward to your kind reply @ParasKoundal
Hi, I want to use Ensemble-PyTorch with PyTorch-Geometric. However, it doesn't recognize the dataloaders.
Is this under development or a bug.