Example of Input Data - Githubissues

codertimo / BERT-pytorch

Google AI 2018 BERT pytorch implementation

Apache License 2.0

6.11k stars 1.29k forks source link

Example of Input Data #28

Closed nateraw closed 5 years ago

nateraw commented 5 years ago

Could you give a concrete example of the input data? You gave an example of the corpus data, but not the dataset.small file found in this line:

bert -c data/dataset.small -v data/vocab.small -o output/bert.model

If you could show perhaps a couple of examples, that would be very helpful! I am new to pytorch, so the dataloader function is a little confusing.

codertimo commented 5 years ago

@nateraw Sorry about my mistake. The README file is just updated to

bert -c data/corpus.small -v data/vocab.small -o output/bert.model

which is same corpus with bert-vocab

and the corpus example is on README 0.prepare your own corpus.

thanx

nateraw commented 5 years ago

Why might this be happening then? I ran these lines...

bert-vocab -c data/dummy_data.small -o data/vocab.small

bert -c data/dummy_data.small -v data/vocab.small -o output/bert.model

nateraw commented 5 years ago

Dummy data looks like this:

codertimo commented 5 years ago

@nateraw Can you update the bert-pytorch version to 0.0.1a4?

pip install -U bert-pytorch

nateraw commented 5 years ago

Interestingly, a different error:

codertimo commented 5 years ago

@nateraw I got what was wrong with both your example corpus and mine. We should not make a blank line in end of the line! If you check the line 18 at your corpus, \n is declared on last line, which means that python recognize there is one more extra line. Please remove the \n of end of character at the end of line. that would be help to fix it

nateraw commented 5 years ago

Wow so dumb! My fault, I will report back if that change works or not.

nateraw commented 5 years ago

Issue was with the tabs as well. Replaced with literal tabs, and it worked. Closing the issue, thank you!