golemfactory / yagna

An open platform and marketplace for distributed computations
GNU General Public License v3.0
391 stars 61 forks source link

Full-text search engine - Golem Network Beta.2 bounty #1457

Closed mat7ias closed 3 years ago

mat7ias commented 3 years ago

Golem Network is a cloud computing power service where everyone can develop, manage and execute workloads in an unstoppable, inexpensive and censorship-free environment.

Since Beta.2 Golem supports a new model of computation – services. In contrast with batch tasks, services are expected to be long-running processes that don't have any natural completion point but rather are started and stopped on explicit command. The goal of this project is to build a full-text search service on Golem. The service would allow its users to perform search queries over a corpus of documents submitted by the requestor during deployment.

Requirements

Non-requirements

Deliverables

Resources

Estimated time to allocate: 24 hours

Useful Links: Bounties Blogpost (including things you need to know!): https://blog.golemproject.net/golem-network-beta-2-bounties/ Beta.2 Blogpost: https://blog.golemproject.net/beta-ii-release/ Docs: https://handbook.golem.network Install video: https://www.youtube.com/watch?v=Wqm7j7CtQwM In case you need support, we’re here for you, join our Discord: https://chat.golem.network Golem Twitter - https://twitter.com/golemproject

gitcoinbot commented 3 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This issue now has a funding of 6000.0 GLM (1527.65 USD @ $0.25/GLM) attached to it.

gitcoinbot commented 3 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work has been started.

These users each claimed they can complete the work by 265 years, 4 months from now. Please review their action plans below:

1) skotre has been approved to start work.

I will:

  1. Get all text files from the requestor.
  2. Get search query from the requestor.
  3. On provider(s), loop through all text files, see which words are the most common and where.
  4. Return text files in a specific order.

I will use Python to accomplish this.

Learn more on the Gitcoin Issue Details page.

gitcoinbot commented 3 years ago

@skotre Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

skotre commented 3 years ago

I am still working on the issue. I have a plan laid out and have started work on the Docker image, but I haven't tried turning it into a Golemized Docker image yet. I have also run service and task examples and digested a lot from the documentation.

gitcoinbot commented 3 years ago

@skotre Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

mat7ias commented 3 years ago

@skotre You can ignore the "Warned for Abandonment of Bounty" notifications. I'm working with Gitcoin to snooze these warnings (I don't have access to do it on my own), for now, they can be ignored from your end.

mat7ias commented 3 years ago

@skotre How is your bounty coming along, do you need any assistance? We recently released a workshop related to services that might be helpful: https://blog.golemproject.net/developing-utilizing-the-golem-service-model/

mat7ias commented 3 years ago

@skotre we haven't had a response from you so for now we'll have to assume you're no longer actively working on this bounty.

niklr commented 3 years ago

Just submitted my work on Gitcoin. You can find it here: https://github.com/niklr/golem-fulltext-search

Let me know what you think @mat7ias

mat7ias commented 3 years ago

Hi @niklr Thanks for your patience, an individual I required to check the application with me was on vacation and has returned this week so I have some feedback. Are you able to address the below?

  1. In the example code ctrl+c doesn't perform a graceful shutdown so the requestor never pays for the job. One could see this even in the demo video (there is "Terminating agreement" log and the script ends).
  2. As far as I understand, index is being read again for every search. This is extremely inefficient. What we wanted is a code that:
    • creates the index (this is done correctly)
    • holds the index in the memory
    • performs a lookup in this index without rereading it again We understand this 2nd point isn't directly specified in the requirements explicitly, but efficiency is quite a regular requirement in any search engine.

Let me know your thoughts and if you're able to address those points. I have some more detailed remarks (below) but those above are the most important.

  1. requirements.txt: whoosh is not needed from what we can tell (or?)
  2. requirements.txt: gvmkit-build is only used if user wants to modify the image, but why would they want to do this? Building the image might not be needed in the README
  3. ENTRYPOINT is currently ignored, so it could be removed from the Dockerfile
  4. what is the purpose of the FtseService.shutdown method?
  5. Currently input is not async, so whole yapapi hangs on it (and e.g. time limit doesn't work while waiting for input). Example implementation of an async input can be found in https://github.com/golemfactory/yapapi-service-manager/blob/master/examples/python_shell.py
niklr commented 3 years ago

Hi @mat7ias

Thank you for the feedback. Let me try to address the mentioned points:

  1. Indeed, does this mean it is currently possible to leverage the Golem cloud computing power without paying for the job? This use case might not be representative, but the requestor was able to accomplish what he wanted. Can I resolve this by implementing the async approach you mentioned in 7?
  2. I tried to keep a reference to the index instance in a class variable (see https://github.com/niklr/golem-fulltext-search/commit/7598900b1fc8f05c3a32ba2aec440b65a60334b0) Tested with test.py works fine but once deployed as image the variable is not initialized anymore when calling search. Do you have something else in mind?
yapapi.rest.activity.CommandExecutionError: Command '{'run': {'entry_point': '/golem/run/ftse.py', 'args': ('--search', 'golem'), 'capture': {'stdout': {'stream': {}}, 'stderr': {'stream': {}}}}}' failed on provider; message: 'ExeScript command exited with code 1'; stderr: 'Traceback (most recent call last):
  File "/golem/run/ftse.py", line 165, in <module>
    search(args.search)
  File "/golem/run/ftse.py", line 137, in search
    result = ftse.search(term)
  File "/golem/run/ftse.py", line 102, in search
    with self.ix.searcher() as searcher:
AttributeError: 'NoneType' object has no attribute 'searcher'
  1. Only needed to run test.py for testing purposes without building/deploying an image.
  2. Depends how you want the README to be structured. I was more addressing developers (also easier for me getting up to speed quickly after 16 days break on this project;)
  3. I will remove the ENTRYPOINT from the Dockerfile
  4. If you mean the following lines they are probably from a sample implementation -> will remove as well
async def shutdown(self):
    # handler reponsible for executing operations on shutdown
    yield self._ctx.commit()   
niklr commented 3 years ago

Hi @mat7ias

All mentioned points should now be covered except for the second where I need your input. Thanks a lot.

niklr commented 3 years ago

With the help of Nebula from Discord I have integrated rpyc wich makes it possible to start a server in a separate thread. This multi-threaded approach keeps the index in memory as requested.

The implementation can be found in the dev_rpyc branch. Let me know if this should be merged into main.

cryptobench commented 3 years ago

@niklr Your work looks splendid. If you'd like you can submit your work over on Gitcoin and we will sort out the payment for you.

(Mattias is on vacation at the moment, so i'm covering for him)

niklr commented 3 years ago

@cryptobench Awesome. I think I have already submitted on Gitcoin see https://gitcoin.co/issue/golemfactory/yagna/1457/100026045

Let me know if something is missing.

cryptobench commented 3 years ago

Hi! Indeed you have - thanks a lot! Payout will happen as soon as possible.