Open severeduck opened 1 year ago
Dear @severeduck,
Thank you for your request. Commencing the learning process from a "Tabula Rasa" standpoint indeed carries certain advantages. Nevertheless, it is imperative to note that defining the input representation, specifying the neural model architecture, and configuring the training setup inherently introduce a degree of prior knowledge.
Reinforcement learning through the Tabula Rasa approach is currently viable. To facilitate this, you may utilize the script generate_random_nn.py to generate a randomly initialized neural network, followed by adherence to the reinforcement learning instructions. Acknowledging the potential for usability enhancements in this process, I concur with your perspective.
How feasible is it to implement the Tabula Rasa approach within our project's domain?
A study conducted by our bachelor student, Rumei Ma, explored "Continual Reinforcement Learning on TicTacToe, Connect4, Othello, Clobber, and Breakthrough" commencing from a Tabula Rasa state. Furthermore, our former master student, now pursuing a Ph.D., Jannis Blüml, initiated training from Tabula Rasa, and the outcomes are detailed in "AlphaZe∗∗: AlphaZero-like baselines for imperfect information games": link
What computational resources, data, or infrastructure would be needed?
Initiating the learning process from a state of zero-knowledge typically demands significantly greater computational resources compared to a network initialized via supervised learning. Our training procedures typically leverage state-of-the-art DGX servers. A distributed training can also be conducted across a network of individual machines, but it requires defining the complete infrastructure setup.
How do we measure the success and progress of the Tabula Rasa learning process?
Potential metrics include performance comparisons against other engines, evaluation against a model trained on supervised data, or benchmarking against human players.
In what scenarios or domains could this approach be most beneficial?
Possibly in domains where no open-source project is currently available to my knowledge, such as shogi.
What are the long-term objectives and expected outcomes of implementing Tabula Rasa learning?
In domains like StarCraft II, challenges arise when attempting to learn from scratch. Achieving a playing strength akin to human proficiency has proven elusive without pretraining or substantial efforts in custom reward shaping. Persistent challenges include sample efficiency and surmounting local optima.
@QueensGambit thank you for the detailed response, and particularly for providing the links. I have recently started exploring the implementation of a simple chess engine for quantum computers, which can be found here: QuantumChess on GitHub.
Additionally, I am utilizing the following resources:
Tabula Rasa Learning Approach Proposal
Summary
I propose implementing a "Tabula Rasa" (clean slate) learning approach for our project, where the system starts with minimal prior knowledge and learns from scratch through self-play or self-improvement mechanisms. This approach aims to allow the system to develop its own understanding and strategies organically.
Background
In many AI systems, predefined heuristics, rule-based algorithms, or human-designed features are used to guide the learning or decision-making process. However, alternative approaches, such as "Tabula Rasa," offer the opportunity to build intelligence without initial biases or predefined rules.
Proposal
The idea is to:
Potential Benefits
Discussion Points
Let's discuss the feasibility and potential implementation strategies for this approach in our project.