cwszz / XPR

Cross-lingual Phrase Retriever
MIT License
7 stars 2 forks source link

Cross-lingual Phrase Retriever

This repository contains the code and pre-trained models for our paper XPR: Cross-lingual Phrase Retriever.

**** Updates ****

Overview

We propose a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences named XPR.

Dataset

We also create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs.

Getting Started

In the following sections, we describe how to use our XPR.

Requirements

Before using XPR, please process the dataset by following the steps below.

Checkpoint

Before using XPR, please process the checkpoint by following the steps below.

Train XPR

bash train.sh

Evaluation

Test our method:

Here is an example for evaluate XPR:

bash test.sh

or

export CUDA_VISIBLE_DEVICES='0'
python3 predict.py \
--lg $lg \
--test_lg $test_lg \
--dataset_path ./datset/ \
--load_model_path ./model/pytorch_model.bin \
--queue_length 0 \
--unsupervised 0 \
--wo_projection 0 \
--layer_id = 12 \
> log/test-${lg}-${test_lg}-32.log 2>&1

References

Please cite this paper, if you found the resources in this repository useful.