Closed zisding closed 6 years ago
I created the crosswikis_p_e_m.txt from the original Crosswikis very long time ago and unfortunately I do not have the code for it. But afaik I only removed mentions that contain the subtring "wikipedia" and converted the remaining dictionary in the format of crosswikis_p_e_m.txt .
Thank you.
Hi,
In the paper, it is mentioned that a dictionary built from a large Web corpus (
crosswikis
) is used. Actually, it (crosswikis
) provides 8 dictionaries, could you please tell me which one is used and if some pre-processing operations have been applied to the original dictionary?I noticed that the original
dictionary.bz2
is 2.7G, which is much larger than the dictionary (crosswikis_p_e_m.txt
: 789M) extracted from basic_data.zip.Thank you.