Team: LoveFishO
Rank: 2
LoveFishO: Algorithm Engineer from NingBo
Report: https://openreview.net/pdf?id=oxOEqVH4tI
IND-WhoIsWho:
Store raw data
out_data:
Store intermediate data generated by the program
output:
Store all result data
Download data from HERE. The code is kvcq
git clone https://github.com/LoveFishoO/2024-KDD-WhoIsWho.git
cd 2024-KDD-WhoIsWho/
pip install -r requirements.txt
Inculde three embedding models
multilingual-e5-large-instruct
(title, abstruct, venue)voyage-large-2-instruct
(title, abstruct, venue)bge-m3
(orgs)python3 encode.py --api_key "The api key of voyageai"
To reproduce the results, I recommend to download embedding data from HERE. The code is jd9u
cd LGB
For E5-Instruct
python3 ./e5_instruct_lgb.py
For Voyage-Instruct
python3 ./voyage_lgb.py
The path of feature data is in ./out_data
cd ..
cd GCN
For E5-Instruct embedding + E5-Instruct LGB features
python3 ./build_graph.py \
--title_embeddings_dir ../out_data/e5_instruct_title_data.pkl \
--abstract_embeddings_dir ../out_data/e5_instruct_abstract_data.pkl \
--venue_embeddings_dir ../out_data/e5_instruct_venue_data.pkl \
--train_feats_dir ../out_data/e5_instruct_train.csv \
--test_feats_dir ../out_data/e5_instruct_test.csv \
--save_train_dir ../out_data/e5_instruct_graph_train.pkl \
--save_test_dir ../out_data/e5_instruct_graph_test.pkl
For E5-Instruct embedding + Voyage LGB features
python3 ./build_graph.py \
--title_embeddings_dir ../out_data/e5_instruct_title_data.pkl \
--abstract_embeddings_dir ../out_data/e5_instruct_abstract_data.pkl \
--venue_embeddings_dir ../out_data/e5_instruct_venue_data.pkl \
--train_feats_dir ../out_data/voyage_train.csv \
--test_feats_dir ../out_data/voyage_test.csv \
--save_train_dir ../out_data/e5_instruct_embed_voyage_feats_graph_train.pkl \
--save_test_dir ../out_data/e5_instruct_embed_voyage_feats_graph_test.pkl
For Voyage embedding + Voyage LGB features
python3 ./build_graph.py \
--title_embeddings_dir ../out_data/voyage_title_data.pkl \
--abstract_embeddings_dir ../out_data/voyage_abstract_data.pkl \
--venue_embeddings_dir ../out_data/voyage_venue_data.pkl \
--train_feats_dir ../out_data/voyage_train.csv \
--test_feats_dir ../out_data/voyage_test.csv \
--save_train_dir ../out_data/voyage_graph_train.pkl \
--save_test_dir ../out_data/voyage_graph_test.pkl
For Voyage embedding + E5-Instruct LGB features
python3 ./build_graph.py \
--title_embeddings_dir ../out_data/voyage_title_data.pkl \
--abstract_embeddings_dir ../out_data/voyage_abstract_data.pkl \
--venue_embeddings_dir ../out_data/voyage_venue_data.pkl \
--train_feats_dir ../out_data/e5_instruct_train.csv \
--test_feats_dir ../out_data/e5_instruct_test.csv \
--save_train_dir ../out_data/voyage_embed_e5_instruct_feats_graph_train.pkl \
--save_test_dir ../out_data/voyage_embed_e5_instruct_feats_graph_test.pkl
For E5-Instruct embedding + E5-Instruct LGB features
python3 ./train.py \
--train_dir ../out_data/e5_instruct_graph_train.pkl \
--test_dir ../out_data/e5_instruct_graph_test.pkl \
--save_result_dir ../output/e5_instruct_gcn.json
For E5-Instruct embedding + Voyage LGB features
python3 ./train.py \
--train_dir ../out_data/e5_instruct_embed_voyage_feats_graph_train.pkl \
--test_dir ../out_data/e5_instruct_embed_voyage_feats_graph_test.pkl \
--save_result_dir ../output/e5_instruct_embed_voyage_feats_gcn.json
For Voyage embedding + Voyage LGB features
python3 ./train.py \
--train_dir ../out_data/voyage_graph_train.pkl \
--test_dir ../out_data/voyage_graph_test.pkl \
--save_result_dir ../output/voyage_gcn.json
For Voyage embedding + E5-Instruct LGB features
python3 ./train.py \
--train_dir ../out_data/voyage_embed_e5_instruct_feats_graph_train.pkl \
--test_dir ../out_data/voyage_embed_e5_instruct_feats_graph_test.pkl \
--save_result_dir ../output/voyage_embed_e5_instruct_feats_gcn.json
Note: please use CPU to train model.
cd LGB
For E5-Instruct
python3 ./inference.py \
--model e5_instruct \
--test_path ../out_data/e5_instruct_lgb_test.csv \
--test_author_path ../IND-WhoIsWho/ind_test_author_submit.json \
--result_path ../output/e5_instruct_lgb.json \
--model_dir ./lgb_model/
For Voyage-Instruct
python3 ./inference.py \
--model voyage \
--test_path ../out_data/voyage_lgb_test.csv \
--test_author_path ../IND-WhoIsWho/ind_test_author_submit.json \
--result_path ../output/voyage_lgb.json \
--model_dir ./lgb_model/
For E5-Instruct embedding + E5-Instruct LGB features
python3 ./inference.py \
--test_dir ../out_data/e5_instruct_graph_test.pkl \
--model_path ./graph_model/e5_instruct_gcn_model.pt \
--save_result_dir ../output/e5_instruct_gcn.json
For E5-Instruct embedding + Voyage LGB features
python3 ./inference.py \
--test_dir ../out_data/e5_instruct_embed_voyage_feats_graph_test.pkl \
--model_path ./graph_model/e5_instruct_embed_voyage_feats_gcn_model.pt \
--save_result_dir ../output/e5_instruct_embed_voyage_feats_gcn.json
For Voyage embedding + Voyage LGB features
python3 ./inference.py \
--test_dir ../out_data/voyage_graph_test.pkl \
--model_path ../out_data/voyage_gcn_model.pt \
--save_result_dir ../output/voyage_gcn.json
For Voyage embedding + E5-Instruct LGB features
python3 ./inference.py \
--test_dir ../out_data/voyage_embed_e5_instruct_feats_graph_test.pkl \
--model_path ../out_data/voyage_embed_e5_instruct_feats_gcn_model.pt \
--save_result_dir ../output/voyage_embed_e5_instruct_feats_gcn.json
cd ..
python3 ensemble.py
Method | AUC |
---|---|
LGB-Voyage | 0.81433 |
LGB-E5-Instruct | 0.81827 |
GCN-E5-Instruct | 0.78082 |
LGB(E5-Instruct/Voyage) x 2 + GCN(E5-Instruct/Voyage) x 4 | 0.82486 |
Note:
E5-Instruct: multilingual-e5-large-instruct
Voyage: voyage-large-2-instruct