deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
Apache License 2.0
1.73k stars 247 forks source link

Error while using Farm 0.7.1, index is out of bounds #806

Closed aloizel closed 3 years ago

aloizel commented 3 years ago

Hi guys, I'm using farm 0.7.1 on a databricks cluster, on a cpu instance (I know that gpu will be better for this but we don't have one at the moment) and I'm having trouble but I don't know if the problem is from farm or torch

My problem looks like this issue from torch : but it should be corrected

I'm on torch 1.7.1 because farm doesn't allow me to use a more recent version

When I launch a training I got this error message :

    232         infer_model = Inferencer(
    233             processor=processor,

/local_disk0/.ephemeral_nfs/envs/pythonEnv-0dc05d70-b778-450c-862e-8696dd8c83c5/lib/python3.8/site-packages/farm/ in train(self)
    299                 # Forward & backward pass through model
    300                 logits = self.model.forward(**batch)
--> 301                 per_sample_loss = self.model.logits_to_loss(logits=logits, global_step=self.global_step, **batch)
    302                 loss = self.backward_propagate(per_sample_loss, step)

/local_disk0/.ephemeral_nfs/envs/pythonEnv-0dc05d70-b778-450c-862e-8696dd8c83c5/lib/python3.8/site-packages/farm/modeling/ in logits_to_loss(self, logits, global_step, **kwargs)
    379         :return loss: torch.tensor that is the per sample loss (len: batch_size)
    380         """
--> 381         all_losses = self.logits_to_loss_per_head(logits, **kwargs)
    382         # This aggregates the loss per sample across multiple prediction heads
    383         # Default is sum(), but you can configure any fn that takes [Tensor, Tensor ...] and returns [Tensor]

/local_disk0/.ephemeral_nfs/envs/pythonEnv-0dc05d70-b778-450c-862e-8696dd8c83c5/lib/python3.8/site-packages/farm/modeling/ in logits_to_loss_per_head(self, logits, **kwargs)
    363                 " with the processor through either 'model.connect_heads_with_processor(processor.tasks)'"
    364                 " or by passing the processor to the Adaptive Model?")
--> 365             all_losses.append(head.logits_to_loss(logits=logits_for_one_head, **kwargs))
    366         return all_losses

/local_disk0/.ephemeral_nfs/envs/pythonEnv-0dc05d70-b778-450c-862e-8696dd8c83c5/lib/python3.8/site-packages/farm/modeling/ in logits_to_loss(self, logits, **kwargs)
    356         label_ids = kwargs.get(self.label_tensor_name)
    357         label_ids = label_ids
--> 358         return self.loss_fct(logits, label_ids.view(-1))
    360     def logits_to_probs(self, logits, return_class_probs, **kwargs):

/local_disk0/.ephemeral_nfs/envs/pythonEnv-0dc05d70-b778-450c-862e-8696dd8c83c5/lib/python3.8/site-packages/torch/nn/modules/ in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/local_disk0/.ephemeral_nfs/envs/pythonEnv-0dc05d70-b778-450c-862e-8696dd8c83c5/lib/python3.8/site-packages/torch/nn/modules/ in forward(self, input, target)
    960     def forward(self, input: Tensor, target: Tensor) -> Tensor:
--> 961         return F.cross_entropy(input, target, weight=self.weight,
    962                                ignore_index=self.ignore_index, reduction=self.reduction)

/local_disk0/.ephemeral_nfs/envs/pythonEnv-0dc05d70-b778-450c-862e-8696dd8c83c5/lib/python3.8/site-packages/torch/nn/ in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   2466     if size_average is not None or reduce is not None:
   2467         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2468     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)

/local_disk0/.ephemeral_nfs/envs/pythonEnv-0dc05d70-b778-450c-862e-8696dd8c83c5/lib/python3.8/site-packages/torch/nn/ in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2262                          .format(input.size(0), target.size(0)))
   2263     if dim == 2:
-> 2264         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   2265     elif dim == 4:
   2266         ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

IndexError: Target 170 is out of bounds.

The Target is not always 170 it depends of the run and the model I chose to train

Have you already had this problem ?

If you need anything more to answer my question just ask

Databricks runtime version : 8.2 (includes Apache Spark 3.1.1, Scala 2.12) Worker Type Standard_DS3_v2


Timoeller commented 3 years ago

Hey @aloizel what are you using FARM for inside databricks (notebook)? Sound amazing.

We have recently tested torch 1.8.1 in and also very recently released a new FARM version

At a first glance it looks more like a torch issue, but lets figure this out together. Could you try updating and report back?

aloizel commented 3 years ago

Hi ! Thanks for the answer

I found the problem it wasn't due to farm it's ok I close the issue

And yes we are using databricks but we use a library to launch our code as a job not as a notebook, it allows us to scheduled our train and follow our data life cycle