OpenBMB / ChatDev

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
https://arxiv.org/abs/2307.07924
Apache License 2.0
25.47k stars 3.2k forks source link

Cannot be generated automatically?​​​ There are discrepancies between the generated code and logs #279

Closed Hutong-flowers-in-March closed 10 months ago

Hutong-flowers-in-March commented 11 months ago

This Document uses one-click development of the Red Packet Rain game as an example. I find it very interesting.

In as little as 3 minutes, ChatDev can generate a ready-to-run application. From popular marketing artifacts such as "Red Packet Rain" to practical business tools such as "Business Card Generator" to various leisure activities such as "Backgammon", "Snake" and "Space Wars" Games……

I saw the PR for Red Packet Rain. After checking the PR, I found that Red Packet Rain does not seem to be generated with one click, but can only be run after manual modification.

I also ran the same prompt several times, but failed to generate it successfully. Can it really be generated automatically?

cd73dd57863910204cbcc1b883da29c 6d8b8cf3ffd6f84e40c97e9206c87a3 6f7335092aba3efd9e52a5a7b26ccec

Alphamasterliu commented 11 months ago

Hello! Thank you very much for your interest and support in our ChatDev. The "Red Packet Rain" software that I've uploaded has indeed been manually optimized by me. Our Warehouse is committed to accommodating works that are particularly interesting—they can be completely auto-generated, or they can be enhancements by players based on works produced by ChatDev. For example, a while ago, a user created a very impressive Texas Hold'em poker game. Even though in the final product, the code contributed by ChatDev might only account for about half, with the rest playing more of a supportive role, we consider it very valuable to give attention to such works. Given the inherent randomness in the stability of GPT, collaboration between humans and AI can be an excellent choice. After all, our goal is to make AI a better tool, and if our products have provided you with more insight in the field of AI, we would be exceedingly honored.

Hutong-flowers-in-March commented 11 months ago

Hello! Thank you very much for your interest and support in our ChatDev. The "Red Packet Rain" software that I've uploaded has indeed been manually optimized by me. Our Warehouse is committed to accommodating works that are particularly interesting—they can be completely auto-generated, or they can be enhancements by players based on works produced by ChatDev. For example, a while ago, a user created a very impressive Texas Hold'em poker game. Even though in the final product, the code contributed by ChatDev might only account for about half, with the rest playing more of a supportive role, we consider it very valuable to give attention to such works. Given the inherent randomness in the stability of GPT, collaboration between humans and AI can be an excellent choice. After all, our goal is to make AI a better tool, and if our products have provided you with more insight in the field of AI, we would be exceedingly honored.

I am hoping to get good results automatically since I was impressed with the impressive results at (https://mp.weixin.qq.com/s/ftivGuvbsOPTHTXNdfs4Ng). However, making changes and corrections myself would be a time-consuming and costly process. Therefore, I am curious to know if ChatDev can produce something that is directly usable. Can you suggest what settings I should use to ensure that the output is usable without any further modifications?

Alphamasterliu commented 11 months ago

Through my experiments during this period, I’ve realized that the most significant factor in improving the outcome is the refinement of the initial prompts given. I revisited the same task about five times, and my understanding of the prompts deepened notably, leading to an almost version-leap optimization in the final software output. I believe this is a skill that needs to be cultivated through repeated experimentation. Of course, there are also many little tricks to optimizing prompts, which you could look up online. You might also want to experiment first with ChatGPT 3.5.

Furthermore, the intrinsic capabilities of GPT greatly determine the quality of code generation. For instance, I recently experimented with GPT-4 Turbo and the 32k model, and the improvements over GPT-3.5 were extremely evident. In other words, as the GPT models and others evolve, including but not limited to processing capacity and length restrictions, the performance of ChatDev is sure to be increasingly enhanced. Of course, our team will also continually update and optimize the mechanism of the software, and you can look forward to our open source iterations.