making money using end-to-end reinforcement learning with self-replicating agents

synctext commented 6 years ago

Scientific goal: create a collaborative live research ecosystem for reinforcement learning

It is impossible to publish in leading AI venues without industry-level resources. Scientist are being starved of contributing their knowledge due to lack of access. Without industry-level resources (thousands of cores from Google, Facebook, or Deepmind clusters) and valuable huge datasets it's impossible to compete.

The "publish or perish" model encourages scientists to cut as many corners as they can in order to produce as many publications as they can. This directly conflicts with the realities of AI, it's hard and requires a lot of work to provide an advancement on the state-of-the-art.

Unpublished codes and a sensitivity to training conditions have made it difficult for AI researchers to reproduce many key results. AI has become a form of "alchemy.". This initiative will create the first fully open re-usable environment. Ideas compete for success and can be re-used.

More specifically, we need open re-usable AI with embodiment and self-replication, see this Science Magazine publication. Bio-inspiration is key.

Robotics researchers increasingly agree that ideas from biology and self-organization can
strongly benefit the design of autonomous robots. Biological organisms have evolved to
perform and survive in a world characterized by rapid changes, high uncertainty, indefinite
richness, and limited availability of information. Industrial robots, in contrast, operate in
highly controlled environments with no or very little uncertainty. Although many
challenges remain, concepts from biologically inspired (bio-inspired) robotics will eventually
enable researchers to engineer machines for the real world that possess at least some of
the desirable properties of biological organisms, such as adaptivity, robustness,
versatility, and agility.

Engineering goal: making money in our micro-economy using end-to-end reinforcement learning with our framework of self-replicating agents using VPS/VPN buying and decentral market

This type of robot will sense the world around it and act upon it. Without intelligent actions it will fail to reproduce and die off. "Motor commands" in the above picture in the old-AI robot world are replaced with robo trading. The whole ecosystem is fully self-organising and has no point of control, central server, or single-point-of-failure.

Robo trading is based on crypto tokens. Since the launch of our first primitive ledger in 2007 we have been working on an accounting system for Bittorrent. Something we now call a token for Bittorrent. Our live deployment and self-replicating AI now make the next step possible: self-replicating AI based on deep reinforcement learning. One full-time phd student is responsible for realizing our token economy: #3337 (see pages of detail there).

From #2925 : The basic idea is to create a micro-economy within the Tribler platform for earning, spending and trading bandwidth tokens. This brings together various research topics, including blockchain-powered decentralized market, anonymous downloading and hidden seeding. Trustworthy behavior and participation should be rewarded while cheating should be punished. A basic policy should prevent users from selfishly consuming bandwidth without giving anything back. This directly addresses the tragedy-of-the-commons phenomena. Our initial release should provide basic primitives to earn, trade and spend tokens. Our work could be extended with more sophisticated techniques like TrustChain record mixing, multiple identities, a robust reputation mechanism for tunnel selection, global consensus and verifiable public proofs (proof-of-bandwidth/proof-of-relay).

Agent specifications

The agent will earn Trustchain records by seeding in Bittorrent and relaying Tor-like traffic. It will sell these records for Bitcoins or Ethereum on our decentral market. Using these coins and our Plebnet framework it will buy VPN and VPS infrastructure and essentially replicate. Detailed architecture of this ecosystem:

token_architecture

The role of AI

The challenge is to put AI at the core of this work. All decisions about money, tokens, replication, and hoarding credits for survival will be taken by autonomous intelligence. By applying end-to-end reinforcement learning we will use a single goal which will drive the behavior of agents: survival.

Generating income is only a means to an end, the primary object of survival. Every month the agent needs to have sufficient bitcoins or ethereum to replace or it will "die". Various parameters will be implemented to influence the behavior and strategies of an agent.

survival based on cost of VPN/VPS, optional multiply (e.g. what providers to prefer)
- strategy: buy cheapest or buy at random
- strategy: high probability to stick with current provider or alternate with providers each month
understand bid/ask volume and market (over)supply (basic understanding of the market)
communicate to other agents; e.g. not expected to survive this month or thriving and breeding
- cooperative experience sharing: disclose private decisions and obtained reward (security?)
- gossip performance and reliability of VPN/VPS providers
parameters for cpu/disk storage versus bandwidth
- what product is more in demand
- long-term archiving or short-term flashcrowd boosting
market making for Bitcoin versus Trustchain bandwidth coins
- probably best implemented using traditional techniques (as discussed)
- no need for or critical role for AI
AI innovation
- not the expert on this, involve experts like: Loog&Tax
- get creative for the Q-learner
- something with adversarial?
we have no idea what to do
- implement and deploy the minimal viable agent
- get creative and be inspired by operational experience
- don't overthink at the start, just do it.
- this is pioneering work, nobody can help us.
- have fun and don't be evil or Skynet

A lot of loose parts of this vision are already in place. The integration step and meaningful intelligence is still lacking. Plebnet is operational: demo_vision

synctext commented 6 years ago

Diversity is essential to survival. Multiple independant code basis.

synctext commented 5 years ago

f-MRI scans show that humans have social norms and reinforcement learning. Reinforcement Learning Signal Predicts Social Conformity Useful source of bio-socio mimicry experiments. Buzzword bingo score of 6+: data-driven bio-socio mimicry using blockchain-based deep reinforcement learning. "using functional magnetic resonance imaging, that conformity is based on mechanisms that comply with principles of reinforcement learning. We found that individual judgments of facial attractiveness are adjusted in line with group opinion. Conflict with group opinion triggered a neuronal response in the rostral cingulate zone and the ventral striatum similar to the “prediction error” signal suggested by neuroscientific models of reinforcement learning. "

synctext commented 5 years ago

f-MRI and public goods Getting to Know You: Reputation and Trust in a Two-Person Economic Exchange Linking this reinforcement learning issue to the math mode of #2805. "Using a multiround version of an economic exchange (trust game), we report that reciprocity expressed by one player strongly predicts future trust expressed by their partner—a behavioral finding mirrored by neural responses in the dorsal striatum. Here, analyses within and between brains revealed two signals—one encoded by response magnitude, and the other by response timing. Response magnitude correlated with the “intention to trust” on the next play of the game, and the peak of these “intention to trust” responses shifted its time of occurrence by 14 seconds as player reputations developed. This temporal transfer resembles a similar shift of reward prediction errors common to reinforcement learning models, but in the context of a social exchange. These data extend previous model-based functional magnetic resonance imaging studies into the social domain and broaden our view of the spectrum of functions implemented by the dorsal striatum."

ToDo: biology-based models of trust (above paper gives real-world measurements of the human brain for trust).

synctext commented 5 years ago

Hidden Technical Debt in Machine Learning Systems Google said: hardest part of AI is not AI. "Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex." hardest-about-ai-is-not-ai

synctext commented 4 years ago

@MateiAnton A first key start for this hugely ambitious project is to get something more efficient going.

As also mentioned in #4659 the current code is not deployed, only lab experiments:

The current bot can operate on our decentralised exchange and has a primitive understanding of pricing and orderbooks.

First step is to use the stable API provided by the creator of Sporestack into Cloudomate. Then we have the stable building blocks for self-replication and making everything more sophisticated/"intelligent".

Quick search came up with some prior work, not something we can re-use I believe. Fancy stuff, also far too complex to re-use; Stock Trading Bot using Deep Reinforcement Learning (Deep Q-learning), Keras and TensorFlow. A Live Machine-Learning based Cryptocurrency Trader

Old 2008 scientific paper: Autonomous Forex Trading Agents

synctext commented 4 years ago

reading on X-mas break and found related work, "autonomous bidding agent" busy quarter. next steps:

mixed, but focus on reading related scientific work
please focus on scientific paper of stuff that actually worked with actual money. Note this is strangely very rare. (e.g. fake money competition)

synctext commented 4 years ago

@MateiAnton busy with regular classes this 3rd quarter. Please understand, re-use, and extend ongoing/prior work: https://github.com/Tribler/distributed-ai-kernel (Python or Kotlin)

qstokkink commented 1 month ago

It seems like this student work was either completed or dropped several years ago. I'll close this issue now.

Tribler / tribler