Open MaskXman opened 1 month ago
Can you let me know which models you are using?
Of course, I used
Thanks, I haven't tried this model, I will get back to you later
Thank you!!!!!!!!
Hi, I have tried the mistral7b-v0.3 model, the OIE stage seems to be working just fine on my side. Can you please share the detailed parameters and data you are using? From the error messages in your screenshots, it seems to be a parsing issue - the function that extracts a list of triplets from the LLM's raw output.
I will also add better logging to this repo for better debugging soon.
Thanks,I tried to use the v0.2 models and this working just fine, But I used llama3、vicuna-7b-v1.5 models. I find this working just not fine, I can not find this reason. These models are orgin from huggingface, I dont fine-tuning. I just test this project. Finally, the datatest is rebel which this project provide.
Ok, I will look into it
Thanks , If you want to ask me to provide anything, I will do , It is my pleasure.
That is really odd, I have tired the models you mentioned and they all work fine, at least in the OIE stage. Can you please share the exact command you run? For example, something like:
python run.py \ --oie_llm lmsys/vicuna-7b-v1.5 \ --oie_few_shot_example_file_path ./few_shot_examples/webnlg/oie_few_shot_examples.txt \ --sd_llm lmsys/vicuna-7b-v1.5 \ --sd_few_shot_example_file_path ./few_shot_examples/webnlg/sd_few_shot_examples.txt \ --sc_llm lmsys/vicuna-7b-v1.5 \ --input_text_file_path ./datasets/example.txt \ --output_dir ./output/example \ --oie_refine_few_shot_example_file_path ./few_shot_examples/webnlg/oie_few_shot_refine_examples.txt \ --ee_llm lmsys/vicuna-7b-v1.5 \ --enrich_schema \ --ee_few_shot_example_file_path ./few_shot_examples/webnlg/ee_few_shot_examples.txt
of course, But I directly run not Terminal, such as:
from argparse import ArgumentParser from edc.edc_framework import EDC import os
os.environ["CUDA_VISIBLE_DEVICES"] = '1,2,3,5,6,7' os.environ["TOKENIZERS_PARALLELISM"] = "false"
if name == "main": parser = ArgumentParser()
parser.add_argument(
"--oie_llm", default="/home/ubuntu/llm_models/Mistral-7B-Instruct-v0.2", help="LLM used for open information extraction."
)
parser.add_argument(
"--oie_prompt_template_file_path",
default="/home/ubuntu/project/edc/prompt_templates/oie_template.txt",
help="Promp template used for open information extraction.",
)
parser.add_argument(
"--oie_few_shot_example_file_path",
default="/home/ubuntu/project/edc/few_shot_examples/default/ee_few_shot_examples.txt",
help="Few shot examples used for open information extraction.",
)
# Schema Definition setting
parser.add_argument(
"--sd_llm", default="/home/ubuntu/llm_models/Mistral-7B-Instruct-v0.2", help="LLM used for schema definition."
)
parser.add_argument(
"--sd_prompt_template_file_path",
default="/home/ubuntu/project/edc/prompt_templates/sd_template.txt",
help="Prompt template used for schema definition.",
)
parser.add_argument(
"--sd_few_shot_example_file_path",
default="/home/ubuntu/project/edc/few_shot_examples/default/sd_few_shot_examples.txt",
help="Few shot examples used for schema definition.",
)
# Schema Canonicalization setting
parser.add_argument(
"--sc_llm",
default="/home/ubuntu/llm_models/Mistral-7B-Instruct-v0.2",
help="LLM used for schema canonicaliztion verification.",
)
parser.add_argument(
"--sc_prompt_template_file_path",
default="/home/ubuntu/project/edc/prompt_templates/sc_template.txt",
help="Prompt template used for schema canonicalization verification.",
)
# Refinement setting
parser.add_argument("--sr_adapter_path", default=None, help="Path to adapter of schema retriever.")
parser.add_argument(
"--oie_refine_prompt_template_file_path",
default="/home/ubuntu/project/edc/prompt_templates/oie_r_template.txt",
help="Prompt template used for refined open information extraction.",
)
parser.add_argument(
"--oie_refine_few_shot_example_file_path",
default="/home/ubuntu/project/edc/few_shot_examples/default/oie_few_shot_refine_examples.txt",
help="Few shot examples used for refined open information extraction.",
)
parser.add_argument(
"--ee_llm", default="/home/ubuntu/llm_models/Mistral-7B-Instruct-v0.2", help="LLM used for entity extraction."
)
parser.add_argument(
"--ee_prompt_template_file_path",
default="/home/ubuntu/project/edc/prompt_templates/ee_template.txt",
help="Prompt templated used for entity extraction.",
)
parser.add_argument(
"--ee_few_shot_example_file_path",
default="/home/ubuntu/project/edc/few_shot_examples/default/ee_few_shot_examples.txt",
help="Few shot examples used for entity extraction.",
)
parser.add_argument(
"--em_prompt_template_file_path",
default="/home/ubuntu/project/edc/prompt_templates/em_template.txt",
help="Prompt template used for entity merging.",
)
# Input setting
parser.add_argument(
"--input_text_file_path",
default="/home/ubuntu/project/edc/datasets/rebel.txt",
help="File containing input texts to extract KG from, each line contains one piece of text.",
)
parser.add_argument(
"--target_schema_path",
default="/home/ubuntu/project/edc/schemas/rebel_schema.csv",
help="File containing the target schema to align to.",
)
parser.add_argument("--refinement_iterations", default=0, type=int, help="Number of iteration to run.")
parser.add_argument(
"--enrich_schema",
action="store_true",
help="Whether un-canonicalizable relations should be added to the schema.",
)
# Output setting
parser.add_argument("--output_dir", default="/home/ubuntu/project/edc/output/Mistral-7B-Instruct-v0.2/", help="Directory to output to.")
args = parser.parse_args()
args = vars(args)
print(args)
edc = EDC(**args)
input_text_list = open(args["input_text_file_path"], "r").readlines()
output_kg = edc.extract_kg(input_text_list, args["output_dir"], refinement_iterations=args["refinement_iterations"])
@MaskXman May I confirm with you what version of transformers package you are using? I tried to update it to the latest version today and it breaks my code. I suspect if that's the reason. I tested my code to be working fine on transformers=4.39.3
Of course.
Maybe my schema retrivel model is not fine ,Could you please provide --relation_definition_csv_path /output/path/to/tekgen/relation/definitions . Thank you!!!!
Thanks a lot for your reply and the information provided, I will carefully look into this issue and I will let you know once it's fixed
Possibly related issue with transformers
Precision: 0.0052173913043478265 Recall: 0.0052173913043478265 F1: 0.0052173913043478265 This is my resulst , so badly, Could you please provide the train realation definition of TeKGEN datasets. Thank you!!!!!
@MaskXman I have checked your running parameters again, as written in the README, please use the corresponding few-shot examples when you run experiments on the specific dataset, i.e. when running on REBEL, please use the few-shot examples in the REBEL directory. Please try running the following command:
python run.py \
--oie_llm mistralai/Mistral-7B-Instruct-v0.2 \
--oie_few_shot_example_file_path ./few_shot_examples/rebel/oie_few_shot_examples.txt \
--sd_llm mistralai/Mistral-7B-Instruct-v0.2 \
--sd_few_shot_example_file_path ./few_shot_examples/rebel/sd_few_shot_examples.txt \
--sc_llm mistralai/Mistral-7B-Instruct-v0.2 \
--input_text_file_path ./datasets/rebel.txt \
--output_dir ./output/rebel \
--oie_refine_few_shot_example_file_path ./few_shot_examples/rebel/oie_few_shot_refine_examples.txt \
--ee_llm mistralai/Mistral-7B-Instruct-v0.2 \
--target_schema ./schemas/rebel_schema.csv \
--ee_few_shot_example_file_path ./few_shot_examples/rebel/ee_few_shot_examples.txt
I have reproduced the experiments on REBEL with Mistral-7b in all modules, and the results obtained are as follows:
-----------------------------------------------------------------
Total scores
-----------------------------------------------------------------
Ent_type
Correct: 6979 Incorrect: 610 Partial: 0 Missed: 3940
Spurious: 5508 Possible: 11529 Actual: 13097
Precision: 0.4869226853359896 Recall: 0.5031697564521992
F1: 0.4929443971755858
-----------------------------------------------------------------
Partial
Correct: 6593 Incorrect: 0 Partial: 996 Missed: 3940
Spurious: 5508 Possible: 11529 Actual: 13097
Precision: 0.49953614663538326 Recall: 0.5126154125772446
F1: 0.5043149289823227
-----------------------------------------------------------------
Strict
Correct: 6117 Incorrect: 1472 Partial: 0 Missed: 3940
Spurious: 5508 Possible: 11529 Actual: 13097
Precision: 0.43441043083900227 Recall: 0.4422282806252272
F1: 0.4373515010374334
-----------------------------------------------------------------
Exact
Correct: 6593 Incorrect: 996 Partial: 0 Missed: 3940
Spurious: 5508 Possible: 11529 Actual: 13097
Precision: 0.46888984092364683 Recall: 0.47737186477644494
F1: 0.47199396871152594
-----------------------------------------------------------------
Scores per tag
-----------------------------------------------------------------
Subjects
-----------------------------------------------------------------
Ent_type
Correct: 2481 Incorrect: 219 Partial: 0 Missed: 1125
Spurious: 1439 Possible: 3825 Actual: 4139
Precision: 0.5316726385210573 Recall: 0.5389312977099237
F1: 0.5336130584549341
-----------------------------------------------------------------
Partial
Correct: 2352 Incorrect: 0 Partial: 348 Missed: 1125
Spurious: 1439 Possible: 3825 Actual: 4139
Precision: 0.5382619307264891 Recall: 0.5489640130861505
F1: 0.5416035727268006
-----------------------------------------------------------------
Strict
Correct: 2181 Incorrect: 519 Partial: 0 Missed: 1125
Spurious: 1439 Possible: 3825 Actual: 4139
Precision: 0.4741860102819754 Recall: 0.47459105779716465
F1: 0.47433418150975404
-----------------------------------------------------------------
Exact
Correct: 2352 Incorrect: 348 Partial: 0 Missed: 1125
Spurious: 1439 Possible: 3825 Actual: 4139
Precision: 0.5053019681154904 Recall: 0.5116684841875682
F1: 0.5074421422513026
-----------------------------------------------------------------
Predicates
-----------------------------------------------------------------
Ent_type
Correct: 2323 Incorrect: 9 Partial: 0 Missed: 1535
Spurious: 2227 Possible: 3867 Actual: 4559
Precision: 0.4706821761783594 Recall: 0.49574700109051256
F1: 0.4775915251596822
-----------------------------------------------------------------
Partial
Correct: 2063 Incorrect: 0 Partial: 269 Missed: 1535
Spurious: 2227 Possible: 3867 Actual: 4559
Precision: 0.45696413425421056 Recall: 0.47110141766630315
F1: 0.46097522978657113
-----------------------------------------------------------------
Strict
Correct: 2063 Incorrect: 269 Partial: 0 Missed: 1535
Spurious: 2227 Possible: 3867 Actual: 4559
Precision: 0.442264631043257 Recall: 0.4444929116684842
F1: 0.443050319364387
-----------------------------------------------------------------
Exact
Correct: 2063 Incorrect: 269 Partial: 0 Missed: 1535
Spurious: 2227 Possible: 3867 Actual: 4559
Precision: 0.442264631043257 Recall: 0.4444929116684842
F1: 0.443050319364387
-----------------------------------------------------------------
Objects
-----------------------------------------------------------------
Ent_type
Correct: 2175 Incorrect: 382 Partial: 0 Missed: 1280
Spurious: 1842 Possible: 3837 Actual: 4399
Precision: 0.46661577608142496 Recall: 0.4706652126499455
F1: 0.46783403437710963
-----------------------------------------------------------------
Partial
Correct: 2178 Incorrect: 0 Partial: 379 Missed: 1280
Spurious: 1842 Possible: 3837 Actual: 4399
Precision: 0.48948745910577973 Recall: 0.5130861504907307
F1: 0.497275795814509
-----------------------------------------------------------------
Strict
Correct: 1873 Incorrect: 684 Partial: 0 Missed: 1280
Spurious: 1842 Possible: 3837 Actual: 4399
Precision: 0.406034169392948 Recall: 0.40665212649945476
F1: 0.40628135223555073
-----------------------------------------------------------------
Exact
Correct: 2178 Incorrect: 379 Partial: 0 Missed: 1280
Spurious: 1842 Possible: 3837 Actual: 4399
Precision: 0.45287168302435477 Recall: 0.4728462377317339
F1: 0.4595419847328244
-----------------------------------------------------------------
Full triple scores
-----------------------------------------------------------------
Precision: 0.18142068119054924 Recall: 0.18124168967986087
F1: 0.18128260202516108
Please let me know if you still observe a huge discrepancy in the results after re-running the commands I gave.
Can you clarify what you mean by there is no relation definition? Also can you double check if you have set the target_schema_path to be the schema of REBEL, please make sure all dataset-specific parameters are set to the ones for REBEL.
Sorry,I find the relation definition in the Target schema , I promise I use rebel all , I will re-check and re-running .Thank you!
No problem, please let me know if you have further questions, it will be more helpful if you can attach the parameters you use when reporting the problems, thanks!
Sorry, I run example dataset,and I find I can not oie_triplets so that output.txt is all the[ ]
Could you please tell me why?,and how can I deal . Thank you very much!!!!!