phd placeholder: "Decentralized Machine Learning Systems for Information Retrieval"

< Placeholder > timeline: April 2023 - April 2027. ToDo: 6 weeks hands-on Python onboarding project. Learn a lot and plan to throw it away. Next step is then to start working on your first scientific article. You need 4 thesis chapters and you then completed your phd.

One idea: towards trustworthy and perfect metadata for search (e.g. perfect memory retrieval for the global brain #7064 ). Another idea: Gradient decent model takes any keyword search query as input. Output is a limited set of vectors. Only valid content is recommended. Learning goal is to provide semantic matching between input query and output vector. General background and Spotity linkage possible dataset sharing

Max. usage of expertise: product/market fit thinking

Background reading:

latest unsupervised learn-to-rank: G-Rank: Unsupervised Continuous Learn-to-Rank for Edge Devices in a P2P Network
First decentralised learning called gossip learning in 2011 (!!!) Gossip Learning with Linear Models on Fully Distributed Data by Róbert Ormándi, István Hegedüs, Márk Jelasity.
Meritrank: Sybil tolerant reputation for merit-based tokenomics
Mitigating sybils in federated learning poisoning (FoolsGold algorithm Thnx @ThomasWerthenbach!)
Cardinal problem for search is accurate metadata (tags, description, popularity, spam-making)
- Without a central editor we need collective decisions, crowdsourcing, or even digital parliament to decide on each item what the perfect naming would be.
- 346 articles on crowdsourcing in a survey
- Smartocracy: Social Networks for Collective Decision Making
- 99-pages on digital democracy No scientific results...
- Towards a User-aware Enrichment of Multimedia Metadata
- CPSFS: A Credible Personalized Spam Filtering Scheme by Crowdsourcing
- Item Popularity and Recommendation Accuracy
- collective intelligence
1965 foundations of ultra intelligent machines
Robotic block stacking, Atari gaming play, text writing plus lots of other tasks by Google made: A Generalist Agent
You need to consistently be critical about the hype and Next Big Thing. Because years later you have a small item popping up which completely kills everything. Current terms: how useful are transformers, huge language models, and foundational model for normal Outlook and MSN users :astonished: Normal people just want their privacy back.
https://github.com/tatsu-lab/stanford_alpaca :tada: GPT-3 clone. Literally runs on a raspberry pi (very slowly). GPT models are incredible but the future is somehow even more amazing than that
Pytorch tips
pointwise approach, broadly speaking, each historic impression with a click is a positive training example, and each impression without a click is a negative training example. https://towardsdatascience.com/learning-to-rank-a-primer-40d2ff9960af
sequence learning, We will implement a character-level sequence-to-sequence model, processing the input character-by-character and generating the output character-by-character. Another option would be a word-level model, which tends to be more common for machine translation., https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
https://developer.nvidia.com/blog/deep-learning-nutshell-sequence-learning/
UPDATE: Vector databases and knowledge graph ideas...
- https://www.alexandria.unisg.ch/258982/1/mso202002.issue.pdf#page=91 "Bringing Semantic Knowledge Graph Technology to Your Data"
- https://towardsdatascience.com/milvus-pinecone-vespa-weaviate-vald-gsi-what-unites-these-buzz-words-and-what-makes-each-9c65a3bd0696
- https://www.latent.space/p/agents
- https://observablehq.com/@asg017/introducing-sqlite-vss

Venues:

Knowledge Graphs, Semantics, Social and Adaptive Web
IEEE Transactions on Computational Social Systems note no prior msc courses on machine learning. We are a systems lab and might know how to apply machine learning in permissionless, byzantine, unsupervised, decentralised, adversarial, continuous learning context.

Possible scientific storyline: SearchZero a decentralised, self-supervised search engine with continuous learning

Tribler / tribler

phd placeholder: "Decentralized Machine Learning Systems for Information Retrieval" #7290