Open Cantoria opened 5 years ago
By the way, when i run step 9,(I didn't run steps before, but i've downloaded all files in polybox) it appears an error
`==> Loading entity wikiid - name map
---> t7 file NOT found. Loading from disk (slower). Out f = /home/xuhongbo/syh/syh/deep-ed/data/generated/ent_name_id_map.t7
==> Loading disambiguation index
Done loading disambiguation index
Still loading entity wikiid - name map ...
/home/xuhongbo/torch/install/bin/lua: ...me/xuhongbo/torch/install/share/lua/5.1/tds/hash.lua:108: bad argument #1 to 'pairs' (table expected, got userdata)
stack traceback:
C: in function 'pairs'
...me/xuhongbo/torch/install/share/lua/5.1/tds/hash.lua:108: in function 'write'
.../xuhongbo/torch/install/share/lua/5.1/torch/File.lua:210: in function 'writeObject'
.../xuhongbo/torch/install/share/lua/5.1/torch/File.lua:388: in function 'save'
entities/ent_name2id_freq/ent_name_id.lua:76: in main chunk
C: in function 'dofile'
entities/ent_name2id_freq/e_freq_gen.lua:16: in main chunk
C: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
`
It seems that Some errors happened in generating file ent_name_id_map.t7, and i got a file ent_name_id_map.t7 in generated file(only 35B). I really don't know lua language, Please tell me what's wrong, thanks!
Hi. The set of entities for which the current code trains entity embeddings is defined here: https://github.com/dalab/deep-ed/blob/master/entities/relatedness/relatedness.lua#L253-L328
You would have to modify this code to train with a different set of entities.
As per your error, I am not sure. Try to delete your ent_name_id_map.t7 and redo that step. These t7 files are not rewritten when you change code or data and thus, have to be deleted manually and then regenerated.
https://github.com/dalab/deep-ed/blob/master/entities/relatedness/relatedness.lua#L253-L328
Hi, I've got the reason why the error happens. I used lua 5.1, and it doesn't support torch. So i installed lua 5.3. It works. Besides, for the first question, I've modify the codes, so i can train the ent vec via specific entity set. But i got some t7 files in ./data/generated/ent_vecs path. Here is moditied code (in main code, former line 253-328):
if not paths.filep(rewtr_t7filename) then
print(' ---> t7 file NOT found. Loading reltd_ents_wikiid_to_rltdid from txt file instead (slower).')
-- Gather the restricted set of entities for which we train entity embeddings:
local rltd_all_ent_wikiids = tds.Hash()
-- 1) From the relatedness dataset
for ent_wikiid,_ in pairs(reltd_ents_direct_validate) do
rltd_all_ent_wikiids[ent_wikiid] = 1
end
for ent_wikiid,_ in pairs(reltd_ents_direct_test) do
rltd_all_ent_wikiids[ent_wikiid] = 1
end
-- 1.1) From a small dataset (used for debugging / unit testing).
for _,line in pairs(ent_lines_4EX) do
local parts = split(line, '\t')
assert(table_len(parts) == 3)
ent_wikiid = tonumber(parts[1])
assert(ent_wikiid)
rltd_all_ent_wikiids[ent_wikiid] = 1
end
-- 2) From all ED datasets: (I 've deleted)
--3) From specific entity set (Here i add some code)
local specific_entity_files = 'specific_entity_file'
if not paths.filep(opt.root_data_dir .. 'basic_data/' .. specific_entity_files) then
print("No specific entity file!")
else
dofile 'entities/ent_name2id_freq/ent_name_id.lua'
it, _ = io.open(opt.root_data_dir .. 'basic_data/' .. specific_entity_files)
local line = it:read()
while(line) do
ent_wikiid = e_id_name.ent_name2wikiid[line]
rltd_all_ent_wikiids[ent_wikiid] = 1
end
end
--codes below aren't changed
-- Insert unk_ent_wikiid
local unk_ent_wikiid = 1
rltd_all_ent_wikiids[unk_ent_wikiid] = 1
-- Sort all wikiids
local sorted_rltd_all_ent_wikiids = tds.Vec()
for ent_wikiid,_ in pairs(rltd_all_ent_wikiids) do
sorted_rltd_all_ent_wikiids:insert(ent_wikiid)
end
sorted_rltd_all_ent_wikiids:sort(function(a,b) return a < b end)
local reltd_ents_wikiid_to_rltdid = tds.Hash()
for rltd_id,wikiid in pairs(sorted_rltd_all_ent_wikiids) do
reltd_ents_wikiid_to_rltdid[wikiid] = rltd_id
end
rewtr = tds.Hash()
rewtr.reltd_ents_wikiid_to_rltdid = reltd_ents_wikiid_to_rltdid
rewtr.reltd_ents_rltdid_to_wikiid = sorted_rltd_all_ent_wikiids
rewtr.num_rltd_ents = #sorted_rltd_all_ent_wikiids
print('Writing reltd_ents_wikiid_to_rltdid to t7 File for future usage.')
torch.save(rewtr_t7filename, rewtr)
print(' Done saving.')
Is that correct?(specific entity files record entity name per line) And i noticed you added a small dataset in step 1 and step 1.1. Can i remove this step? If i can't, does the small dataset influence the final entity vec?
Thanks for your input.
Yes, the small dataset in 1.1 can be removed, it was just for debugging (containing < 10 entities if i recall well).
To access the specific entity vectors, you have first to load the t7 file via https://github.com/dalab/deep-ed/blob/master/entities/relatedness/relatedness.lua#L331 and then access the specific entity vectors using the dictionaries in the rewtr hashtable object. https://github.com/dalab/deep-ed/blob/master/entities/relatedness/relatedness.lua#L321-L324 Given a wiki ID of an entity, you first find its rltdid using rewtr.reltd_ents_wikiid_to_rltdid[your_wiki_id], and then you access its embedding using the rltdid row of the entity embedding tensor (from the t7 file). See an example here: https://github.com/dalab/deep-ed/blob/master/entities/pretrained_e2v/e2v.lua#L3-L28 . Sorry, this code could have been made easier ...
Hi, I read your code and i know i can get all entity vecs by changing learn_a.lua -entities flag. I don't need such big vec set. How can i train entity vecs given specified entity set? Thanks.