`gpt-4o` performance / improvement evaluation

fujitatomoya commented 5 months ago

gpt-4o is ready to be called via API, see the difference about performance, latency and cost. and if it is better we can make it default model.

root@tomoyafujita:~/docker_ws/ros2_colcon# ros2 ai status -lv
----- api_model: gpt-4
----- api_endpoint: https://api.openai.com/v1
----- api_token: None
----- api_temperature: 0.5
As an artificial intelligence, I am always available to assist you 24/7.
[SUCCESS] Valid OpenAI API key.
Available Models:
dall-e-3
whisper-1
davinci-002
dall-e-2
gpt-3.5-turbo-16k
tts-1-hd-1106
gpt-4o-2024-05-13
tts-1-hd
gpt-4o
gpt-4
gpt-4-0613
gpt-3.5-turbo-1106
gpt-3.5-turbo-0125
gpt-3.5-turbo-instruct-0914
gpt-3.5-turbo
gpt-3.5-turbo-instruct
tts-1
gpt-3.5-turbo-0301
babbage-002
gpt-4-1106-preview
gpt-4-turbo-2024-04-09
tts-1-1106
text-embedding-3-large
gpt-4-turbo
text-embedding-3-small
gpt-3.5-turbo-0613
text-embedding-ada-002
gpt-4-1106-vision-preview
gpt-4-0125-preview
gpt-4-vision-preview
gpt-4-turbo-preview
gpt-3.5-turbo-16k-0613

fujitatomoya commented 5 months ago

root@tomoyafujita:~/docker_ws/ros2_colcon# ros2 ai exec "give me all topics in detail" -d
ChatCompletion(id='chatcmpl-9QH7pNdSCprrzSQOhFczCRXKQgwTj', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='```bash\nros2 topic list -v\n```', role='assistant', function_call=None, tool_calls=None))], created=1716049385, model='gpt-4o-2024-05-13', object='chat.completion', system_fingerprint='fp_729ea513f7', usage=CompletionUsage(completion_tokens=11, prompt_tokens=46, total_tokens=57))

gpt-4o provides the command with prefixed bash\n like above, this spawns the another bash process in the subprocess routine, so that we cannot see the output. besides, spawned process will be alive until signal comes in to the bash process, that is not expected behavior here. i am not sure why this difference would happen with the same exact parameters, but i think it needs to make sure the command should be started with ros2.

fujitatomoya commented 5 months ago

Execution time comparison

Generally gpt-4o is faster than gpt-4 as following,

gpt-4

root@tomoyafujita:~/docker_ws/ros2_colcon# time ros2 ai exec "give me all topics in detail"
Published topics:
 * /parameter_events [rcl_interfaces/msg/ParameterEvent] 1 publisher
 * /rosout [rcl_interfaces/msg/Log] 1 publisher

Subscribed topics:

real    0m1.474s
user    0m0.792s
sys 0m0.172s
root@tomoyafujita:~/docker_ws/ros2_colcon# time ros2 ai exec "give me all parameter"

real    0m2.078s
user    0m0.880s
sys 0m0.163s
root@tomoyafujita:~/docker_ws/ros2_colcon# time ros2 ai exec "パラメータリストを取得してください"

real    0m2.488s
user    0m0.845s
sys 0m0.188s
root@tomoyafujita:~/docker_ws/ros2_colcon# time ros2 ai exec "どんなトピックがありますか"
/parameter_events
/rosout

real    0m1.439s
user    0m0.837s
sys 0m0.140s

gpt-4o

root@tomoyafujita:~/docker_ws/ros2_colcon# time ros2 ai exec "give me all topics in detail"
Published topics:
 * /parameter_events [rcl_interfaces/msg/ParameterEvent] 1 publisher
 * /rosout [rcl_interfaces/msg/Log] 1 publisher

Subscribed topics:

real    0m1.385s
user    0m0.827s
sys 0m0.158s
root@tomoyafujita:~/docker_ws/ros2_colcon# time ros2 ai exec "give me all parameter"

real    0m1.986s
user    0m0.877s
sys 0m0.176s
root@tomoyafujita:~/docker_ws/ros2_colcon# time ros2 ai exec "パラメータリストを取得してください"

real    0m1.962s
user    0m0.891s
sys 0m0.176s
root@tomoyafujita:~/docker_ws/ros2_colcon# time ros2 ai exec "どんなトピックがありますか"
/parameter_events
/rosout

real    0m1.521s
user    0m0.826s
sys 0m0.131s

Text generation time comparison

gpt-4o is way much better than gpt-4, significantly improved. (generated explanation is more detailed.)

https://github.com/fujitatomoya/ros2ai/assets/43395114/5ee14701-1a35-4ec3-94d4-80cdeb3b4258

Cost

Using the same exact tests and commands, gpt-4o consumes only half tokens of gpt-4 or less.

fujitatomoya commented 5 months ago

https://github.com/fujitatomoya/ros2ai/assets/43395114/8aace743-ae03-4f31-99e9-d715f5af3536

https://github.com/fujitatomoya/ros2ai/assets/43395114/87bedb4f-0c8f-46e8-b445-be376f03a626

fujitatomoya / ros2ai