AnjieCheng / SpatialRGPT

[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
https://www.anjiecheng.me/SpatialRGPT
Apache License 2.0
66 stars 5 forks source link

Code for LLM-based Complex Reasoning Question-Answer generation. #3

Open ikodoh opened 1 month ago

ikodoh commented 1 month ago

Hi,

Thank you for sharing your great work. Is there any plan to release the code for generating LLM-based Complex Reasoning Question-Answer? It seems there is no code for it.

I really appreciate any help you can provide.

AnjieCheng commented 1 month ago

Hi, we use Sglang to generate complex QAs more efficiently.

python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-70B-Instruct --port 30002 --tp 8

Below is a sample pseudocode snippet that demonstrates how to use it after setting up the Sglang server.

from sglang import function, gen, set_default_backend, RuntimeEndpoint

set_default_backend(RuntimeEndpoint("http://localhost:30002"))

response_regex = (
    r'\{'
    + r'    "Question": "[\w\d\s<>?,.!]{1,256}",'
    + r'    "Answer": "[\w\d\s<>?,.!]{1,256}"'
    + r'\}'
)

@function
def llama_question(s, instructions):
    s += """
    You are a helpful assistant tasked with generating spatial reasoning-based questions and answers from provided descriptions of scenes.

    Guidelines:
    - Craft questions without directly revealing specific details from the description
    - Generate questions related to the description using <regionX>
    - Use the description to answer, not to leak information into the question
    - Use <regionX> instead of specific objects or regions
    - Speak from the observer's perspective
    - Ensure all description objects or regions are mentioned with <regionX> in the question

    Examples:
    1. [Objects]: <region4> sofa, <region1> chair
       [Description]: The path between the <region4> and <region1> is 1.5 meters.
       "Question": You are a cleaning robot that is 1 meter wide. Now you are standing in a living room and see the image; you want to move from here to the door that leads to the backyard. Do you think I can go through the path between the <region4> and <region1>?
       "Answer": The path between <region4> and <region1> is 1.5 meters, so yes, the robot can go through the path between <region4> and <region1> since it is wider than the robot's width.

    2. [Objects]: <region2> apple, <region3> orange
       [Description]: <region2> is positioned on the left side of <region3>.
       "Question": You see two fruits, an apple in <region2> and an orange in <region3>. Which one is more on the left side?
       "Answer": The apple in <region2> is more on the left.

    3. [Objects]: <region3> desk, <region6> bed
       [Description]: <region3> is further to the viewer than <region6>.
       "Question": You are exploring a bedroom and walking towards <region3> and <region6>. Which one will you reach first?
       "Answer": You will reach the bed first because it is closer to you than the desk, which is further away.

    4. [Objects]: <region0> book
       [Description]: <region0> is 50 cm in width.
       "Question": You are a librarian currently standing in front of a 40 cm width bookshelf, and you see <region0> that you want to place on the shelf. Can you determine if <region0> will fit on the shelf?
       "Answer": <region0> is 50 cm in width, so the shelf is not wide enough to hold a book of that size. Please find a larger shelf.

    Now it's your turn!
    """

    s += instructions
    s += gen("json_output", max_tokens=512, regex=response_regex)

    return s

# LLM generation using sglang, feed in template-based instructions
llama_response = llama_question.run_batch(
    instructions,
    progress_bar=True,
    temperature=0.2
)

Note that you may need to filter out some bad samples, as LLMs can sometimes produce unexpected results.