aalok-sathe / surprisal

A unified interface for computing surprisal (log probabilities) from language models! Supports neural, symbolic, and black-box API models.
https://aalok-sathe.github.io/surprisal/
MIT License
32 stars 6 forks source link

Error when using Python-based tokenizers #1

Open aalok-sathe opened 2 years ago

aalok-sathe commented 2 years ago
Traceback (most recent call last):
  File "/net/vast-storage/scratch/vast/evlab/asathe/code/composlang/lmsurprisal/notebooks/extract_surprisals.py", line 73, in <module>
    main()
  File "/net/vast-storage/scratch/vast/evlab/asathe/code/composlang/lmsurprisal/notebooks/extract_surprisals.py", line 57, in main
    surprisals = [
  File "/net/vast-storage/scratch/vast/evlab/asathe/code/composlang/lmsurprisal/surprisal/model.py", line 133, in extract_surprisal
    surprisals = self.surprise([*textbatch])
  File "/net/vast-storage/scratch/vast/evlab/asathe/code/composlang/lmsurprisal/surprisal/model.py", line 184, in surprise
    tokens=tokenized[b], surprisals=-logprobs[b, :].numpy()
  File "/home/asathe/om2-home/anaconda3/envs/surprisal/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 240, in __getitem__
    raise KeyError(
KeyError: 'Indexing with integers (to access backend Encoding for a given batch index) is not available when using Python based tokenizers'
aalok-sathe commented 1 year ago

Needs replication. What model was used? What tokenizer was used?