It fixes an issue with the _add_embeddings_by_chunks() method that was not including all embeddings since it assumed that the range interval when slicing was [start,end] but it is [start,end) , e.g. it excludes the last value.
It adds a missing config for the GpuIndexFlatIP and its corresponding type.
This PR:
finetune_qa.py
in a user-friendly way. In a machine with 8 GPUs, we can run:torchrun --standalone --nnodes 1 --nproc_per_node 8 finetune_qa.py --train_data $DATA_DIR/nq_data/train.100-shot.jsonl --eval_data $DATA_DIR/nq_data/test.jsonl --name "my_finetuning_experiment" --checkpoint_dir $DATA_DIR/experiments/ --total_steps 31 --index_mode faiss --faiss_index_type pq --faiss_code_size 16 --model_path $DATA_DIR/models/atlas/base --load_index_path $DATA_DIR/indices/atlas/wiki/base --reader_model_type google/t5-base-lm-adapt
It fixes an issue with the
_add_embeddings_by_chunks()
method that was not including all embeddings since it assumed that the range interval when slicing was[start,end]
but it is[start,end)
, e.g. it excludes the last value.It adds a missing config for the GpuIndexFlatIP and its corresponding type.