chtmp223 / topicGPT

TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)
https://chtmp223.github.io/topicGPT
228 stars 37 forks source link

Meet some preblems when refining. #5

Closed zhangduolaAAA closed 7 months ago

zhangduolaAAA commented 10 months ago

Hi, i am applying your code to my research project, but during the period of refinement, i meet some problems. Hope you can answer!

  1. This is the instruct of refinement python3 script/refinement.py --deployment_name gpt-4 \ --max_tokens 500 --temperature 0.0 --top_p 0.0 \ --prompt_file prompt/refinement.txt \ --generation_file data/output/generation_1.jsonl \ --topic_file data/output/generation_1.md \ --out_file data/output/refinement.md \ --verbose True \ --updated_file data/output/refinement.jsonl \ --mapping_file data/output/refinement_mapping.txt \ --refined_again False \ --remove False You can find there is a parameter called out_file, but check your refinement.py. Only definition, not applying. So i guess you wrongly use topic_file in line 290, that should be out_file.

    image
  2. After correct that, i run the instruct of refinement(just use the sample data generation_1.jsonl), but sadly i find that there is no result, i don't know why. And i check your script/example.ipynb, it seems that there is no running precess of refinement. Also, because you didn't provide the result documents in the folder data/output, so i can't check my problem.

No offence. I will appreciate it if i can get your response!

chtmp223 commented 10 months ago

Hi there, sorry for the late response!

1) You are right, line 290 should be args.out_file, not args.topic_file. Thank you for catching that! I updated the code to fix this.

2) You're seeing no output because there are no topics that needs merging. As you can see in the generation_1.md file, the topics are quite distinct from one another. I also updated the code to notify you if there are no topics to be merged during the refinement process. In practice, I would suggest that you take a look at the topic list first (the same one stored in args.topic_file) and determine whether your list needs refinement or not. You should only run the code if there are multiple duplicate and minor topics.

Let me know if this helps!

zhangduolaAAA commented 10 months ago

Thanks for your response. You so nice. Got it and i will try. I'll give you feedback after trying!

chtmp223 commented 7 months ago

Closing this issue due to inactivity. Let me know if you have any other questions!