gmihaila / ml_things

This is where I put things I find useful that speed up my work with Machine Learning. Ever looked in your old projects to reuse those cool functions you created before? Well, this repo is designed to be a Python Library of functions I created in my previous project that can be reused. I also share some Notebooks Tutorials and Python Code Snippets.
https://gmihaila.github.io
Apache License 2.0
245 stars 61 forks source link

AttributeError: 'BucketIterator' object has no attribute #14

Closed muhammadfhadli1453 closed 2 years ago

muhammadfhadli1453 commented 2 years ago

I’m doing seq2seq machine translation on my own dataset. I have preproceed my dataset using this code.

def tokenize_word(text):
  return nltk.word_tokenize(text)

id = Field(sequential=True, tokenize = tokenize_word, lower=True, init_token="<sos>", eos_token="<eos>")
ti = Field(sequential=True, tokenize = tokenize_word, lower=True, init_token="<sos>", eos_token="<eos>")

fields = {'id': ('i', id), 'ti': ('t', ti)}

train_data = TabularDataset.splits(
    path='/content/drive/MyDrive/Colab Notebooks/Tidore/',
    train = 'id_ti.tsv',
    format='tsv',
    fields=fields
)[0]

id.build_vocab(train_data)
ti.build_vocab(train_data)

print(f"Unique tokens in source (id) vocabulary: {len(id.vocab)}")
print(f"Unique tokens in target (ti) vocabulary: {len(ti.vocab)}")

train_iterator = BucketIterator.splits(
    train_data,
    batch_size = batch_size,
    sort_within_batch = True,
    sort_key = lambda x: len(x.id),
    device = device
)

The output of code above is below:

Unique tokens in source (id) vocabulary: 1425
Unique tokens in target (ti) vocabulary: 1297

The problem comes when i tried to split train_data using BucketIterator.split(). When I want to print the value of _trainiterator, It says that it has no attribute 'i', eventough i had declare the fields. Here is my code to print it:

for data in train_iterator:
  print(data.i)

The output of code above is below:

AttributeError                            Traceback (most recent call last)

<ipython-input-9-322cc3aa78d6> in <module>()
      1 for data in train_iterator:
----> 2   print(data.i)

AttributeError: 'BucketIterator' object has no attribute 'i'

When i try just to print data, the result makes me more confuse: image

I am very confuse, because i don’t know what key i should use for train iterator. Thank you for your help

gmihaila commented 2 years ago

Can you try and print data.__dict__ to see how the object looks like?