Segmentation Fault When Running Example in README

sidjha1 commented 8 months ago

Hello, I'm interested in reproducing some of the results and eventually testing it with my own models. I was following the README and was able to run the following command without any issues.

python run_irera.py \
    --dataset_name esco_tech \
    --state_path ./results_precompiled/esco_tech_infer-retrieve-rank_00/program_state.json \
    --lm_config_path ./lm_config.json \
    --do_validation \
    --do_test

However, when I try to run the command below (copied from the README), I get a segmentation fault error.

(xmc) sidjha@guestrin-hgx-1:~/xmc.dspy$ python compile_irera.py \
>     --dataset_name esco_tech \
>     --ontology_name esco \
>     --prior_path ./data/esco/esco_priors.json \
>     --ontology_path ./data/esco/skills_en_label.txt \
>     --infer_signature_name infer_esco \
>     --rank_signature_name rank_esco \
>     --retriever_model_name sentence-transformers/all-mpnet-base-v2 \
>     --infer_student_model_name llama-2-13b-chat \
>     --infer_teacher_model_name gpt-3.5-turbo-instruct \
>     --rank_student_model_name gpt-4-1106-preview \
>     --rank_teacher_model_name gpt-4-1106-preview \
>     --infer_compile_metric_name rp10 \
>     --rank_compile_metric_name rp10 \
>     --prior_A 0 \
>     --rank_topk 50 \
>     --do_validation \
>     --do_test \
>     --optimizer_name left-to-right \
>     --lm_config_path ./lm_config.json 
./local_cache/compiler
dataset_name:  esco_tech
retriever_model_name:  sentence-transformers/all-mpnet-base-v2
infer_signature_name:  infer_esco
infer_student_model_name:  llama-2-13b-chat
infer_teacher_model_name:  gpt-3.5-turbo-instruct
rank_signature_name:  rank_esco
rank_student_model_name:  gpt-4-1106-preview
rank_teacher_model_name:  gpt-4-1106-preview
infer_compile:  True
infer_compile_metric_name:  rp10
rank_skip:  False
rank_compile:  True
rank_compile_metric_name:  rp10
prior_A:  0
rank_topk:  50
do_validation:  True
do_test:  True
prior_path:  ./data/esco/esco_priors.json
ontology_path:  ./data/esco/skills_en_label.txt
ontology_name:  esco
optimizer_name:  left-to-right
Dataset: esco_tech
# esco_tech: Total Validation size: 75
# esco_tech: Total Test size: 338
esco_tech: avg # ontology items per input (for validation set): 1.75
esco_tech: Q25, Q50, Q75, Q95 # ontology items per input (for validation set): 0.25    1.0
0.50    1.0
0.75    2.0
0.95    3.0
Name: label, dtype: float64
esco_tech: # Used Train size: 10
esco_tech: # Used Validation size: 65
esco_tech: # Used Test size: 338
/lfs/guestrin-hgx-1/0/sidjha/miniconda3/envs/xmc/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Going to sample between 1 and 2 traces per predictor.
Will attempt to train 10 candidate sets.
-3 range(0, 20)
-2 range(0, 20)
-1 range(0, 20)
 30%|█████████████████████████████▋                                                                     | 3/10 [00:01<00:02,  2.40it/s]
Bootstrapped 2 full traces after 4 examples in round 0.
Segmentation fault (core dumped)

I also get segmentation faults when running bash scripts/compile_left_to_right.sh.

KarelDO commented 8 months ago

I've not encountered this error. Do you know which part of the code throws this?

tma15 commented 5 months ago

I had encountered the segmentation fault caused in Evaluate_execute_multi_thread of dspy. I don't know an actual reason but this can be avoided by setting num_threads in LeftToRightOptimizer to 1 by default.

class LeftToRightOptimizer:
    def __init__(
        self,
        modules_to_lms: dict[str, tuple],
        infer_compile: bool,
        infer_compile_metric_name: str,
        rank_compile: bool,
        rank_compile_metric_name: str,
    ):
        # TODO: add an optimization config
        self.modules_to_lms = modules_to_lms

        self.infer_compile = infer_compile
        self.infer_compile_metric = supported_metrics[infer_compile_metric_name]

        self.rank_compile = rank_compile
        self.rank_compile_metric = supported_metrics[rank_compile_metric_name]

        # compilation hyperparameters
        self.max_bootstrapped_demos = 2
        self.max_labeled_demos = 0
        self.max_rounds = 1
        self.num_candidate_programs = 10

        # self.num_threads = 8
        self.num_threads = os.environ.get('DSP_NUM_THREADS', 1)

KarelDO / xmc.dspy

Segmentation Fault When Running Example in README #4