SoftWiser-group / iTag

Implementation of An Integral Tag Recommendation Model for Textual Content.
10 stars 3 forks source link

constructing shared.txt #7

Closed Fateme2020 closed 4 years ago

Fateme2020 commented 4 years ago

Hi .I have questions about preparing data to load in itag. I followed all previous issues and I learned a lot . but there is some problems yet . How we should exactly create the shared.txt file? I have an array containing all words. also I have the brs array. You had an example in last issue : brs = np.array([[3, 1, 2, 4],[3,2,1,6],[6,7,1,5]]) ms = np.array([[0,1,0,1],[0,1,1,1],[1,1,0,0]]) sfs = np.array([[1,2],[1,2],[1]]) shared.txt : {3:1,1:2} 3 is the id of witch word in witch array? and if word 6 was an tag in sentence 2 so shared.txt what should be? thanks a lot

Tangworld commented 4 years ago

The shared.txt contains a dictionary, which map the same word in texts and tags. In shared.txt, key is the id of a word and value is the id of a tag which is the same as the word. So in this example, {3:1, 1:2} means the word with id 3 in sentences and the tag with is 1 are the same word, and then the word with id 1 in sentences and the tag with id 2 are the same word. For your question, if word 6 was an tag(the tag's id was 3) in sentence 2 so shared.txt would be: {3:1,1:2, 6:3}.

Fateme2020 commented 4 years ago

Thanks. In initial parameters whats the deference between two these parameters? could you explain these parameters clearly? thanks ALL_WORDS : total amount of words and tags WORD_VOCAB: amount of words in texts

Tangworld commented 4 years ago

WORD_VOCAB = amount of words that only occur in texts LABEL_VOCAB = amount of words that only occur in tags ALL_WORDS = WORD_VOCAB + amount of words that occur in both texts and tags(shared words) + LABEL_VOCAB

Fateme2020 commented 4 years ago

Thanks . I can the iTag for my dataset and after 2 epocs it finishes all measure scores are zero. this the output: Have you please any idea where i am wrong (I'm new in keras and ml) output1

Tangworld commented 4 years ago

Well, it's important to read README carefully. Our model requires Python 2.7. Best wishes.

Fateme2020 commented 4 years ago

Thanks I switched to linux and python 2.7. but now there is out of index errors because of wrong initial parameters.according to this comment : WORD_VOCAB = amount of words that only occur in texts LABEL_VOCAB = amount of words that only occur in tags ALL_WORDS = WORD_VOCAB + amount of words that occur in both texts and tags(shared words) + LABEL_VOCAB

all_words > WORD_VOCAB + LABEL_VOCAB but in your orginal itag.py all_word is less than WORD_VOCAB + LABEL_VOCAB. in my case I have only 3 tags and so DE_TOKENS = 3-2 =1 ! and so there is out of index error in util.py in on_hot function. I really need help to calculate there initial params. thanks for your helps

Tangworld commented 4 years ago

In fact, the printed content of shared_dataset.py can help you. You could set parameters like this: ALL_WORDS = sfs start WORD_VOCAB = amount of words that only occur in texts LABEL_VOCAB = amount of words that only occur in tags DE_TOKENS = LABEL_VOCAB - 2 MAX_WORDS = 100 MAX_LABELS = 6

INDEX_FROM = 3 END_TOKEN = sfs end START_TOKEN = sfs start LABEL_FROM = tag from

And for your question, you could solve it simply by increasing the value of LABEL_VOCAB such as adding 2 to LABEL_VOCAB. Best wishes.