BUG of Complex and CP: achieve >99% MRR with only one linear layer.

quqxui commented 12 months ago

Hi, Yihong

I found a bug when i change the network of Complex and CP. I only add one linear layer for entity or relation embedding, than achieve unbelievable results:

Epoch: 0 TRAIN: {'MRR': 1.0, 'hits@[1,3,10]': [1.0, 1.0, 1.0]} VALID: {'MRR': 0.9929838180541992, 'hits@[1,3,10]': [0.9928429126739502, 0.993099570274353, 0.9931280612945557]} TEST: {'MRR': 0.9911285042762756, 'hits@[1,3,10]': [0.9909361600875854, 0.9912049174308777, 0.9914003610610962]}

You can reproduce the results by adding rel = self.fc(rel):

class ComplEx(KBCModel):
    def __init__(
            self, sizes: Tuple[int, int, int], rank: int,
            init_size: float = 1e-3
    ):
        super(ComplEx, self).__init__()
        self.fc = nn.Linear(2* rank, 2*rank)

    def score(self, x):
        rel = self.embeddings[1](x[:, 1])
        rel = self.fc(rel)

    def forward(self, x, score_rhs=True, score_rel=False, score_lhs=False):
        rel = self.embeddings[1](x[:, 1])
        rel = self.fc(rel)

I suspect there was a data leak, but I couldn't find the reason. Can you provide any ideas?

Looking forward to your reply.

yihong-chen commented 11 months ago

Hi @quqxui, thanks for your reaching out. I have tried to run your code but it is working well on my side. I can't reproduce the BUG on my side. This is the log.

 Git commit ID: b'bccfac39ac88935f1853d201acde8a48f5771f96\n'
  2 Creating a sampler of size 544230
  3 546000it [00:32, 16924.16it/s]
  4 Evaluate the split train
  5 Evaluating the rhs
  6 Evaluating the lhs
  7 Num neg per head/tail 0
  8 Evaluate the split valid
  9 Evaluating the rhs
 10 Evaluating the lhs
 11 Num neg per head/tail 0
 12 Evaluate the split test
 13 Evaluating the rhs
 14 Evaluating the lhs
 15 Num neg per head/tail 0
 16   Epoch:  0
 17   TRAIN: {'MRR': 0.16256607696413994, 'hits@[1,3,10]': [0.1615000069141388, 0.16249999403953552, 0.16449999809265137]}
 18   VALID: {'MRR': 0.17417104169726372, 'hits@[1,3,10]': [0.17254063487052917, 0.17496435344219208, 0.17670373618602753]}
 19   TEST: {'MRR': 0.17088011279702187, 'hits@[1,3,10]': [0.169256329536438, 0.17140623927116394, 0.17387375235557556]}
 20   0%|                                                                                                                                                             | 0/544230 [00:00<?, ?it/s]
 21 545000it [00:21, 25204.97it/s]
 22   0%|                                                                                                                                                             | 0/544230 [00:00<?, ?it/s]
 23 545000it [00:21, 25046.87it/s]
 24 Evaluate the split train
 25 Evaluating the rhs
 26 Evaluating the lhs
 27 Num neg per head/tail 0
 28 Evaluate the split valid
 29 Evaluating the rhs
 30 Evaluating the lhs
 31 Num neg per head/tail 0
 32 Evaluate the split test
 33 Evaluating the rhs
 34 Evaluating the lhs
 35 Num neg per head/tail 0
 36   Epoch:  3
 37   TRAIN: {'MRR': 0.20998940244317055, 'hits@[1,3,10]': [0.2084999978542328, 0.2110000103712082, 0.21199999749660492]}
 38   VALID: {'MRR': 0.2195245362818241, 'hits@[1,3,10]': [0.2187054455280304, 0.21984601020812988, 0.22084403038024902]}
 39   TEST: {'MRR': 0.2179299332201481, 'hits@[1,3,10]': [0.21714061498641968, 0.21824000775814056, 0.2192172408103943]}
 40   0%|                                                                                                                                                             | 0/544230 [00:00<?, ?it/s]
 41 545000it [00:33, 16485.99it/s]

And also the validation MRR seems working well. Did you change anything else?

quqxui commented 11 months ago

I found that different environment can change the experimental results, but I'm not sure which toolkit has the effect.

When I used python=3.9.1 , torch=1.13.1+cu116, I get normal results with original code, while get 99% MRR with additional nn.Linear().

When I used python=3.6.13 , torch=1.10.0+cu113, I get 0.0001425 MRR with original code, while get normal results with additional nn.Linear().

Can you provide all the version of packages in you environment by runing 'pip freeze > requirements.txt' ?

Thank you very much!

yihong-chen commented 11 months ago

Hi @quqxui , this is what I get when I run 'pip freeze > requirements.txt'. Let me know if you have other questions. I am using python 3.7

antlr4-python3-runtime==4.8
argon2-cffi @ file:///tmp/build/80754af9/argon2-cffi_1596828452693/work
asttokens==2.0.5
attrs @ file:///tmp/build/80754af9/attrs_1600298409949/work
backcall==0.2.0
bitarray==2.5.1
bleach @ file:///tmp/build/80754af9/bleach_1600439572647/work
certifi==2021.10.8
cffi @ file:///tmp/build/80754af9/cffi_1600699180754/work
chardet==3.0.4
click==7.1.2
cloudpickle==2.0.0
colorama==0.4.4
configparser==5.0.1
Cython==0.29.30
dataclasses==0.6
decorator==4.4.2
defusedxml==0.6.0
docker-pycreds==0.4.0
entrypoints==0.3
executing==0.8.2
faiss==1.7.1
future==0.18.2
gitdb==4.0.5
GitPython==3.1.11
higher==0.2.1
hydra-core==1.0.7
icecream==2.1.1
idna==2.10
importlib-metadata @ file:///tmp/build/80754af9/importlib-metadata_1602276842396/work
importlib-resources==5.8.0
ipykernel @ file:///tmp/build/80754af9/ipykernel_1596206598566/work/dist/ipykernel-5.3.4-py3-none-any.whl
ipython @ file:///tmp/build/80754af9/ipython_1598883837425/work
ipython-genutils==0.2.0
ipywidgets @ file:///tmp/build/80754af9/ipywidgets_1601490159889/work
jedi @ file:///tmp/build/80754af9/jedi_1596490743326/work
Jinja2==2.11.2
joblib==1.1.0
jsonschema @ file:///tmp/build/80754af9/jsonschema_1602607155483/work
jupyter==1.0.0
jupyter-client @ file:///tmp/build/80754af9/jupyter_client_1601311786391/work
jupyter-console @ file:///tmp/build/80754af9/jupyter_console_1598884538475/work
jupyter-core==4.6.3
littleutils==0.2.2
MarkupSafe @ file:///tmp/build/80754af9/markupsafe_1594371495811/work
mistune @ file:///tmp/build/80754af9/mistune_1594373098390/work
mkl-fft==1.3.1
mkl-random @ file:///tmp/build/80754af9/mkl_random_1626179032232/work
mkl-service==2.4.0
nbconvert @ file:///tmp/build/80754af9/nbconvert_1594376811065/work
nbformat @ file:///tmp/build/80754af9/nbformat_1602783287752/work
networkx==2.6.3
notebook @ file:///tmp/build/80754af9/notebook_1601501580008/work
numpy @ file:///tmp/build/80754af9/numpy_and_numpy_base_1634106693478/work
ogb==1.3.6
omegaconf==2.0.6
outdated==0.2.1
packaging==20.4
pandas==1.3.4
pandocfilters==1.4.2
parso==0.7.0
pathtools==0.1.2
pexpect @ file:///tmp/build/80754af9/pexpect_1594383317248/work
pickleshare @ file:///tmp/build/80754af9/pickleshare_1594384075987/work
Pillow==8.0.1
portalocker==2.4.0
prometheus-client==0.8.0
promise==2.3
prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1602688806899/work
protobuf==3.13.0
psutil==5.7.3
ptyprocess==0.6.0
pyarrow==8.0.0
pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work
Pygments @ file:///tmp/build/80754af9/pygments_1600458456400/work
pyparsing==2.4.7
pyrsistent @ file:///tmp/build/80754af9/pyrsistent_1600141707582/work
python-dateutil==2.8.1
pytz==2021.3
PyYAML==5.3.1
pyzmq==19.0.2
qtconsole @ file:///tmp/build/80754af9/qtconsole_1600870028330/work
QtPy==1.9.0
regex==2022.6.2
requests==2.24.0
sacrebleu==2.1.0
scikit-learn==1.0.1
scipy==1.7.2
Send2Trash==1.5.0
sentry-sdk==1.4.3
setproctitle==1.2.3
shortuuid==1.0.1
six==1.15.0
smmap==3.0.4
submitit==1.4.0
subprocess32==3.5.4
tabulate==0.8.10
tensorboardX==2.5.1
termcolor==1.1.0
terminado==0.9.1
testpath==0.4.4
threadpoolctl==3.0.0
torch==1.10.0
torchaudio==0.10.0
torchvision==0.11.1
tornado==6.0.4
tqdm==4.62.3
traitlets @ file:///tmp/build/80754af9/traitlets_1602787416690/work
typing-extensions==3.7.4.3
urllib3==1.25.11
wandb==0.12.19
watchdog==0.10.3
wcwidth @ file:///tmp/build/80754af9/wcwidth_1593447189090/work
webencodings==0.5.1
widgetsnbextension==3.5.1
yaspin==2.1.0
zipp @ file:///tmp/build/80754af9/zipp_1604001098328/work

quqxui commented 11 months ago

I found where is the problem: I did not add a linear layer to the function def get_queries(self, queries, target='rhs'): and def get_factor(self, x):.

I think the phenomenon of significant fluctuations in prediction results (sometimes 99% or somtimes 0.01% MRR) is due to the different scale of scores and targets, when calculating ranks:

ranks[b_begin:b_begin + batch_size] += torch.sum(
    (scores >= targets).float(), dim=1
).cpu()

Thank you very much for your kind assistance. I would also like to express my apologies for any inconvenience caused by my question.

yihong-chen commented 11 months ago

Glad that you have solved the issue. Feel free to ping me again if you find other questions.

facebookresearch / ssl-relation-prediction

BUG of Complex and CP: achieve >99% MRR with only one linear layer. #21