Closed shinkuan closed 1 year ago
Concludingly, I leave this issue as a valid enhancement request, but also label it as "won't fix" because supporting the format of Tenhou's game records is expensive to implement, has a low priority, and the game records are somewhat restricted in use. The rationale for this conclusion is explained below.
First of all, let me explain the difference between the game record formats of Majsoul and Tenhou. The types of information contained in Majsoul's game record are much larger than those in Tenhou's, and Majsoul's game records are much easier to convert them into annotations for learning. This is one of the reasons why this project adopts the format of Majsoul's as input. At the same time, however, this means that we need to recover a lot of missing information from the Tenhou's game records in order to support its format as input. The cost of implementing this process is relatively high.
Secondly, in use of Tenhou' game records, the following restriction is applied: "天鳳と競合する製品への開発・応用を目的として牌譜を使用していただくことはできません。" (https://tenhou.net/sc/raw/ , "You may not use the game records for the purpose of developing or applying them to a product that competes with Tenhou.") In a legal consultation with a lawyer, I asked about the exact legal interpretation of the scope of application of this restriction. He told me that the interpretation and scope of the phrase "products that compete with Tenhou" is vague, and that the purpose of use restricted by this phrase could be taken very broadly. Adding Tenhou's game records to the training data imposes this vague restriction in use of trained models, but I do not think that Tenhou's game records are worth this additional restriction. This is the second reason why this project does not currently support for the Tenhou's.
I see your point on the quality of game records. I think it is debatable that players' level in Tenhou Phoenix corresponds to which grades in Majsoul, though.
In the context of supervised learning, or more precisely, which should be called behavioral cloning in the context of learning choices of expert mahjong players, low quality of training examples only plays as noise. However, in the context of reinforcement learning, the situation is quite different. A certain low quality of the training samples is rather important in reinforcement learning. Locally suboptimal selection will play a role in exploration in reinforcement learning to some extent. Or, it is also important to learn how much penalty should be applied to poor choices. In other words, mistakes in low-quality training examples can serve as negative training examples, which may indirectly contribute to the total performance of the model.
As the SUPHX paper shows, behavioral cloning alone is not enough to reach state-of-the-art performance. Behavioral cloning alone is not suited for learning long term decision-making processes such as mahjong. Reinforcement learning (or more sophisticated variants of imitation learning, inverse reinforcement learning, and so on) must be applied. Therefore, as explained above, the low quality of training examples becomes paradoxically important, and it is a complete misunderstanding in the context of reinforcement learning that it is sufficient to simply increase the quality of training examples.
Then, what is more important than quality in this project? Here, I would like to remind you of the goal of this project. The goal of this project is not to reach the same level of performance as SUPHX or NAGA, but to beat them at least. The Tenhou's game records are also used by NAGA and SUPHX, and in terms of the quality of game records, I can be on the same starting line as NAGA and SUPHX by adding Tenhou's game records. However, in terms of computational resources and, more essentially, financial background, there is absolutely no way that this personal project can compete with SUPHX and NAGA, which are corporate R&D projects. The only core competence of this project at present is the volume of game records. It should be obvious by now that the greatest priority should be placed on how to convert quantity into quality in order to achieve the goals of this project.
Actually, my ongoing private experiments have already shown that the power of quantity does overwhelm the power of quality. The amount of the Tenhou Phoenix's game records available now is at most 20% of the amount I already have. And this is about the same amount of game records that can be obtained by crawling from Majsoul just one month. I think that only the 20% increase at most of the power of quantity by adding the game records of Tenhou Phoenix is not worth the additional effort in coding and restriction on trained models, which are explained above.
To avoid misunderstanding, note that I am not denying the positive effects of adding game records of the Tenhou Phoenix at all. This is just a matter of priorities. Since I follow curriculum learning procedure consisting of behavioral cloning on game records of experts followed by reinforcement learning, there is no doubt that increasing the number of game records of experts in the behavioral cloning will have some jumpstart effect in the early stages of reinforcement learning. Nevertheless, I don't know that this will contribute to the final asymptotic performance improvement in reinforcement learning (For the precise meanings of the terms jumpstart and asymptotic in the curriculum learning for reinforce learning, please refer to, e.g., "Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey.").
I offer my conclusions on this issue once again. I leave this issue open as a valid enhancement request. At the same time, I give this issue the "won't fix" label based on the above-mentioned rationales. However, the "won't fix" label does not mean "anyone won't fix this issue" at all, but "I won't fix this issue for now" and "someone (including you, of course) might fix this issue."
While game records from Maj-Soul is exteremely large, the quality is far from records in Tenhou Phoenix. Tenhou Phoenix Table's quality is surely much better then the Throne room in Majsoul. Game records of Throne room is much less then the Tenhou Phoenix Table also. So to get better model I think this AI should learn Tenhou's game records too.