allenai / commonsense-kg-completion

MIT License
107 stars 16 forks source link

some question about simple_lm_finetuning #4

Closed woocoder closed 3 years ago

woocoder commented 4 years ago

Hello, could you tell me the details of fine-tune code? I'm trying to add some nodes in the atomic node data, and I need to re-train bert embedding of the nodes.

But I got some trouble with simple_lm_finetuning.py RuntimeError: index out of range: Tried to access index -1 out of table with 511 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418 with debug ,I can see that the 'lm_label_ids' variable contains "-1".

Here is my 'train_corpus', the struct is same with your 'atomic_node_names.txt' file:

image

And if we change the train_corpus to your 'atomic_node_names.txt' file, the error still appears.

My transformers lib is version 2.2.0 Is there some thing wrong?

chaitanyamalaviya commented 4 years ago

Hi! The error was due to an earlier incompatible version of the transformers repo and it should be fixed with the latest commit on this repo. FYI, I'm using transformers version 3.0.1.

woocoder commented 4 years ago

Hi! The error was due to an earlier incompatible version of the transformers repo and it should be fixed with the latest commit on this repo. FYI, I'm using transformers version 3.0.1.

Hi, Thank you for your reply. But it's not work for me.

I'm change the transformers version to 3.0.1, but still get error message.

here is my example print message:

image

here is my traceback:

image

It looks like the lm_label_ids cannot be '-1'.

woocoder commented 4 years ago

Hi! The error was due to an earlier incompatible version of the transformers repo and it should be fixed with the latest commit on this repo. FYI, I'm using transformers version 3.0.1.

Hi, I think I found the answer.

the function parameters was not map to the right position.

image

Here is BertForMaskedLM 'forward' function, but I don't know the 'is_next' mean. How to handle this?

image

thank you :)

chaitanyamalaviya commented 4 years ago

If you were to pull from the repo again, this issue should be resolved already. The model class used here is BertForPreTraining and the keyword argument next_sentence_label should be equal to is_next.

woocoder commented 4 years ago

If you were to pull from the repo again, this issue should be resolved already. The model class used here is BertForPreTraining and the keyword argument next_sentence_label should be equal to is_next.

Yes! the problem has been solved.

And I get the result like this

image

But compare with the 'bert_model_embeddings/nodes-lm-atomic' folder, there is not the 'atomic_bert_embeddings.pt'.

Do I need use for each the atomic_nodes_names to get the bert embedding?

Just like this?

image
chaitanyamalaviya commented 4 years ago

Yes, that's right. The atomic_bert_embeddings.pt are actually generated and saved to disk when you run the KB completion script (in this method).

woocoder commented 4 years ago

Yes, that's right. The atomic_bert_embeddings.pt are actually generated and saved to disk when you run the KB completion script (in this method).

Thank you for your reply.

Well, I can understand the model need to load the pretrained lm 'lm_pytorch_model' then calculate the ber_embedding, but I got some new question.

  1. This code need to use the network to get node_list link, but in here is not set the network, so I changed like this : image

2, The original 'lm_pytorch_model.bin' file need to load from 'torch.load('lm_pytorch_model.bin')' with 'pytorch-pretrained-bert' , so I install the dependence (version is 0.6.2)

  1. when I run the code, here function will throw an error (error point in apex package).
image
  1. when I uninstall the apex package, I got the error like this.
image image

It seems like it's a 'pytorch-pretrained-bert' package bug.

How should I fix this.

Thank you very much!

woocoder commented 4 years ago

Yes, that's right. The atomic_bert_embeddings.pt are actually generated and saved to disk when you run the KB completion script (in this method).

Thank you for your reply.

Well, I can understand the model need to load the pretrained lm 'lm_pytorch_model' then calculate the ber_embedding, but I got some new question.

1. This code need to use the network to get node_list [link](https://github.com/allenai/commonsense-kg-completion/blob/d3ead4d9a9056bef007c65738febea77deb88757/src/bert_feature_extractor.py#L233), but in [here](https://github.com/allenai/commonsense-kg-completion/blob/d3ead4d9a9056bef007c65738febea77deb88757/src/model.py#L176) is not set the network, so I changed like this :
image

2, The original 'lm_pytorch_model.bin' file need to load from 'torch.load('lm_pytorch_model.bin')' with 'pytorch-pretrained-bert' , so I install the dependence (version is 0.6.2)

1. when I run the code, [here](https://github.com/allenai/commonsense-kg-completion/blob/d3ead4d9a9056bef007c65738febea77deb88757/src/bert_feature_extractor.py#L258) function will throw an error (error point in apex package).
image
1. when I uninstall the apex package, I got the error like this.
image image

It seems like it's a 'pytorch-pretrained-bert' package bug.

How should I fix this.

Thank you very much!

Oh, I see.

The pytorch-pretrained-bert version is 0.3.0, It works now.

P.S. Suggest to add the version in the requirements.txt.