This repository contains the code and pre-trained models for our paper XPR: Cross-lingual Phrase Retriever.
**** Updates ****
We propose a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences named XPR.
We also create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs.
In the following sections, we describe how to use our XPR.
torch==1.8.1+cu111
version corresponding to your platforms/CUDA versions. PyTorch version higher than 1.8.1
should also work. git clone git@github.com:cwszz/XPR.git
cd xpr
pip install -r requirements.txt
mkdir data
mkdir model
mkdir result
Before using XPR, please process the dataset by following the steps below.
Download Our Dataset Here: link
Unzip our dataset and move dataset into data folder. (Make sure the path in bash file is the path of dataset)
Before using XPR, please process the checkpoint by following the steps below.
Download Our Checkpoint Here: link
Get our checkpoint files and move the files in repo into model folder.
bash train.sh
Test our method:
Here is an example for evaluate XPR:
bash test.sh
or
export CUDA_VISIBLE_DEVICES='0'
python3 predict.py \
--lg $lg \
--test_lg $test_lg \
--dataset_path ./datset/ \
--load_model_path ./model/pytorch_model.bin \
--queue_length 0 \
--unsupervised 0 \
--wo_projection 0 \
--layer_id = 12 \
> log/test-${lg}-${test_lg}-32.log 2>&1
Please cite this paper, if you found the resources in this repository useful.