Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.74k stars 445 forks source link

phd placeholder: "Decentralized Machine Learning Systems for Information Retrieval" #7290

Open synctext opened 1 year ago

synctext commented 1 year ago

< Placeholder > timeline: April 2023 - April 2027. ToDo: 6 weeks hands-on Python onboarding project. Learn a lot and plan to throw it away. Next step is then to start working on your first scientific article. You need 4 thesis chapters and you then completed your phd.

One idea: towards trustworthy and perfect metadata for search (e.g. perfect memory retrieval for the global brain #7064 ). Another idea: Gradient decent model takes any keyword search query as input. Output is a limited set of vectors. Only valid content is recommended. Learning goal is to provide semantic matching between input query and output vector. General background and Spotity linkage possible dataset sharing

Max. usage of expertise: product/market fit thinking

Background reading:

Venues:

Possible scientific storyline: SearchZero a decentralised, self-supervised search engine with continuous learning

mg98 commented 2 weeks ago

In the last three weeks I was occupied with the re-doing of the stochastic calculations of the chunking algorithm AE, and its "cousin" RAM. The purpose of this analysis is deriving a formula in order to understand the relationship between their parameter $h$ and the expected mean chunk size $\mu$. This enables us to conduct a comparative study between all algorithms. ~added 3 page math appendix A

We're approaching 40 pages now, and thinking how to sell this work with @grimadas; Ideas of breaking this work into two papers: one survey/SoK theoretical paper, and one about the empirical study. Possible target: JSys 1st August

Draft thesis title for forms: Decentralized Machine Learning Systems for Information Retrieval