Closed synctext closed 2 years ago
To be altered during Johan meeting monday
We have switched from a bottom up to a top down approach, meaning, no stub, rather we will implement sepparte functionalities and later consolidate them together into one app.
A word list of 20k words was found from: http://corpus.leeds.ac.uk/list.html Under a creative commons license. Considered alternatives was a larger data set, a lemmatized dataset and a paid dataset. The larger data set was clearly too large as 5mb in memory just for this purpose seems to be overkill: https://www.kaggle.com/rtatman/english-word-frequency @synctext nevermind, we asked a question but we solved it already. It looks like a healthy ammount of 60k word stems together with a 0,7 mb lighwight java stemming library will yield the best outcomme for this.
LiteratureDAO Query:mars isru methane
PDFs Creative commons: https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/d8/1d/ (many GBytes, many directories)
This week the query handler was written which includes the document ranking methods. We also have now over 300 PDF for testing. Besides that, we can also pass peer to peer messages now. Below we can see the pdf rating.
This week I implemented the parsing of PDF's as a coroutine to make it a non stopping process. However it still stops (intuitively i think that is becouse of the nature of the task, not the thread executing it) I also worked on storing and loading the metadata of PDF files for search queries and made sure we can run the app on a physical mobile device.
Peter and Rahul worked on passing the search string from the gui to the backend and implementing a PDF import button into the GUI. Throwing the intend and getting the results back in logging, still a work in progress.
Keon has been working on seeding for downloading the PDF's from peers. I an Keon settled on an architecture to settle queries. We think its best to save the keywords locally and transmit only the results of a query after doing a local comparison.
Continuous scoreing of local parsed documents as a user types in the searchbar:
@TODO
The day of reckoning has come and we have to make our final pull request to the actual mother repo. To tie the current state back to the previous feedback:
Some gifs of the app functioning:
Lots more polished level:
Some changes that have been merged into master involves with a invalid library (info.blockchain.api 1.1.4), which breaks the master pipeline. https://github.com/Tribler/trustchain-superapp/pull/113#issuecomment-1118951508 https://github.com/Tribler/trustchain-superapp/pull/113#issuecomment-1119337108
This work has been completed, closing the issue 👍
LiteratureDAO Source code is here?? https://github.com/keonchennl/trustchain-superapp/tree/lit-dao/literaturedao Related work: Novel public review model, great idea. use pre-print services, public review process. No more rejects or accepts
Great dataset: [170,919 Creative Commons articles in the arXiv for biology](http://api.biorxiv.org/reports/content_summary)
Your task is to gather scientific publications and engineer machine reading of scientific knowledge in BrainDAO. Thousands of scientific articles are available with Creative Commons copyright license in simple .PDF format. Get thousands of such files on each device and start processing. Use a light library for Natural Language Processing. Use the Bittorrent engine inside Superapp for efficient file sharing. Use IPv8 community to gossip new content. What does this have to do with our "Blockchain Engineering course? True, this is adding lots of data and processing on top of our blockchain-based BrainDAO. Reading: https://doi.org/10.3389/frma.2019.00002
13 years ago: material from Leonardo: https://bitbucket.org/ldalonzo/p2p-search-scientific-pubs/src/master/ . a thesis. A few lessons I learnt (COPIED):
Other related work is the MusicDAO: feel free to re-use all that code. First steps:
Please keep it simple, this will all fail if you try to get something as ambitious as knowledge graph operational on Android with a blockchain. Key points for grading: merged pull request on Superapp and architecture that works; performance and usability is secondairy.