ZJU-DAILY / LargeEA

Source code for LargeEA: Aligning Entities for Large-scale Knowledge Graphs, VLDB 2022
32 stars 2 forks source link

Question about the running time #4

Closed nxchenbnu closed 1 year ago

nxchenbnu commented 1 year ago

Thanks for your work ! I have a question of 【Table 3: Overall EA results on DBP1M】. At this table, you give the running time of your model. I want to know what the time include and the actual unit of it. Does the time include all training time or just one epoch? The time mean hours or seconds? Does the time you mentioned in all tables have the same meaning? Thank you :) !

xz-liu commented 1 year ago

Thank you very much for your interest in our work. The overall running time is reported in hours. It consists of 4 parts:

  1. The entity name's semantic embedding (BERT) & similarity calculation (FAISS-GPU).
  2. The entity name's text similarity by edit distance. Calculating all edit distances would be impossible, so we use MinHash (https://github.com/ekzhu/datasketch) to filter out some candidates before calculating.
  3. The CPS partition time. It would take a little bit of time.
  4. The mini-batch training on partitioned data. That includes all epochs of training on all mini-batches.

You can take a look at figure 4 for the exact running time of different parts.