Can we use our own data set to train the models and predict our own test set? - Githubissues

Sshanu / Relation-Classification-using-Bidirectional-LSTM-Tree

TensorFlow Implementation of the paper "End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures" and "Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths" for classifying relations

MIT License

185 stars 41 forks source link

Can we use our own data set to train the models and predict our own test set? #14

Open xinxu1018 opened 6 years ago

xinxu1018 commented 6 years ago

Can we use our own data set to train the models and predict our own test set?

Sshanu commented 6 years ago

You can, just make sure the input format remains the same.

On Wed, Oct 17, 2018 at 7:24 AM xinxu1018 notifications@github.com wrote:

Can we use our own data set to train the models and predict our own test set?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Sshanu/Relation-Classification/issues/14, or mute the thread https://github.com/notifications/unsubscribe-auth/APvEHzSOPblJDTxAoi1H6gAbCHnDKeb7ks5ulo3DgaJpZM4Xi-p_ .

xinxu1018 commented 6 years ago

@Sshanu Thanks so much for your quick response! Please allow me to ask one more question. Since I am using word embeddings trained over my specific corpus instead of your given Glove embeddings, how can I get my embeddings in the same format with the Glove embedding file you provided in the data folder and use it in your designed LSTM model?

All the best!

Sshanu commented 6 years ago

I first extracted words and stored it in a list named as vocab, then extracted word embedding and stored it in a numpy array. If 2nd word in vocab is "the", the 2nd row in numpy array will have word embedding corresponding to "the". I then saved both vocab and word embedding numpy array using pickle. So, you can create a similar array and vocab, or you can change the code to load embeddings.

On Wed, Oct 17, 2018 at 9:14 PM xinxu1018 notifications@github.com wrote:

@Sshanu https://github.com/Sshanu Thanks so much for your quick response! Please allow me to ask one more question. Since I am using word embeddings trained over my specific corpus instead of your given Glove embeddings, how can I get my embeddings in the same format with the Glove embedding file you provided in the data folder and use it in your designed LSTM model?

All the best!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sshanu/Relation-Classification/issues/14#issuecomment-430681418, or mute the thread https://github.com/notifications/unsubscribe-auth/APvEH5M7hwIaEYdC0p093V4MPlrF7FFrks5ul1BmgaJpZM4Xi-p_ .

xinxu1018 commented 6 years ago

@xinxu1018 That is so informative! Thanks a lot!

xinxu1018 commented 6 years ago

@Sshanu It works! Thanks a lot! Can I ask a follow-up question again? If I wanna classify relations between multi-word terms (in your case it is one-word term pairs), how can I preprocess the sentences before I go to the step of dependency path extraction? Do you have any suggestions? One way I am considering is to connect every word within a multi-word term using underscores (like, "system configuration" to "system_configuration" ) and then treat them as a one-word term. Then follow your designed procedures. Not sure if it will work. Do you have any ideas?

Many thanks!

Sshanu commented 6 years ago

You can try this approach but its shortcoming is that you don't have word embedding for the multi-word term. Create a dependency tree, then choose entity from the two word which is below another one in the tree, then the information regarding the other word will be computed from the lstm -tree, and instead of only using features of lca, entity1, and entity2 from the lstm-tree for relation classification, also use the features of the other word from the lstm-tree.

I did not work on Relation Classification or any related field after this project, this project was my first in NLP, that's why I have very less knowledge in this relation classification or extraction.

On Thu, Oct 18, 2018 at 12:57 AM xinxu1018 notifications@github.com wrote:

@Sshanu https://github.com/Sshanu It works! Thanks a lot! Can I ask a follow-up question again? If I wanna classify relations between multi-word terms (in your case it is one-word term pairs), how can I preprocess the sentences before I go to the step of dependency path extraction? Do you have any suggestions? One way I am considering is to connect every word within a multi-word term using underscores (like, "system configuration" to "system_configuration" ) and then treat them as a one-word term. Then follow your designed procedures. Not sure if it will work. Do you have any ideas?

Many thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sshanu/Relation-Classification/issues/14#issuecomment-430757521, or mute the thread https://github.com/notifications/unsubscribe-auth/APvEH10qEJGzLGNtyWXIfXWIEnXIPJDxks5ul4SZgaJpZM4Xi-p_ .

xinxu1018 commented 6 years ago

@Sshanu Thanks a lot! Hope you everything goes very well!

xinxu1018 commented 6 years ago

@Sshanu Could you please provide your word_embd_wiki file? I cannot find the embedding file in your given data folder. Thanks for you help!

Best,

Sshanu commented 6 years ago

My google drive is full, please share a folder with me, where I will upload the word_embed file.

On Thu, Oct 18, 2018 at 9:03 PM xinxu1018 notifications@github.com wrote:

@Sshanu https://github.com/Sshanu Could you please provide your word_embd_wiki file? I cannot find the embedding file in your given data folder. Thanks for you help!

Best,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sshanu/Relation-Classification/issues/14#issuecomment-431055793, or mute the thread https://github.com/notifications/unsubscribe-auth/APvEH44UwKs0jDTAByJd8PV4ORR6PXK2ks5umJ9SgaJpZM4Xi-p_ .

xinxu1018 commented 6 years ago

@Sshanu How can I share a folder with you? What's your address? Sorry, I am new here!

xinxu1018 commented 6 years ago

@Sshanu Hi Sshanu, I just shared a google drive folder to the email you provided in your Github profile. Not sure am I doing right! Many thanks!

Sshanu commented 6 years ago

Oh, please share it with sshanukr@gmail.com

On Thu, Oct 18, 2018 at 9:43 PM xinxu1018 notifications@github.com wrote:

@Sshanu https://github.com/Sshanu Hi Sshanu, I just shared a google drive folder to the email you provided in your Github profile. Not sure am I doing right! Many thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sshanu/Relation-Classification/issues/14#issuecomment-431070355, or mute the thread https://github.com/notifications/unsubscribe-auth/APvEHwTaPKHc9kpPbTxI8xl0QvV6dsahks5umKi3gaJpZM4Xi-p_ .

xinxu1018 commented 6 years ago

@Sshanu I have shared to your gmail. Please check and many thanks!

xinxu1018 commented 6 years ago

@Sshanu Hi Sshanu, got your shared file! You helped me a lot! I am just wondering do you have the original code that was used to separate embedding file into vocab and word_embedding arrays? Then I can generate my own trained embeddings into the format aligned with your designed method. Could you please share me the code? Thanks again!

Sshanu commented 6 years ago

I don't have the file, if you are having a problem in generating the exact file, then simply try storing the embeddings in numpy array and vocabulary in a list, my code will work afterward.

On Fri, Oct 19, 2018 at 10:17 AM xinxu1018 notifications@github.com wrote:

@Sshanu https://github.com/Sshanu Hi Sshanu, got your shared file! You helped me a lot! I am just wondering do you have the original code that was used to separate embedding file into vocab and word_embedding arrays? Then I can generate my own trained embeddings into the format aligned with your designed method. Could you please share me the code? Thanks again!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sshanu/Relation-Classification/issues/14#issuecomment-431243694, or mute the thread https://github.com/notifications/unsubscribe-auth/APvEH2QJJ7qTVg8HHwGhi8TqBQvRMP3Dks5umVlLgaJpZM4Xi-p_ .