MOZI-AI / annotation-scheme

Human Gene annotation service backend
GNU General Public License v3.0
3 stars 4 forks source link

Use channels for communication between parser and annotation functions #197

Closed Habush closed 4 years ago

Habush commented 4 years ago

In the current code, we use ListLinks to return a bunch of results from pattern matching functions that are passed into the parser and also are written to files. As mentioned in #164, this creates issues with the Pattern Miner and it is memory inefficient. Hence, we should improve the current primitive threading by using fibers and use its channel implementation as a conduit for passing results from the PM functions to both the parser and the file writer directly.

Habush commented 4 years ago

A simple demo code for the above

Habush commented 4 years ago

BTW @linas, one of the pitfalls mentioned in fibers is the use of guile mutexes. The author suggests replacing them with either atomic boxes or channels. We use mutexes to guard the hash-table used in make-afunc from concurrent access. However, I am quited confused on how atomic boxes might be used with make-afunc. Any ideas? :)

linas commented 4 years ago

Yes. Basically, everywhere you might have a set! in the code you have to replace it by atomic-box-compare-and-swap! In the case of make-afunc-cache, that code is here: github blah blah atomspace-git/opencog/scm/opencog/base/atom-cache.scm and it uses hashx-set! and somehow you'd have to wrap the hashx-set! with the atomic-box-compare-and-swap! in some way. This would take some experimenting to get it to work right.

I'm tempted to try to convince you to do this ... not sure how hard/easy this is ... at any rate, maybe start with a new open issue against the atomspace, that describes this.

Also -- the afunc-cache thing needs to be thred safe only if you are accessing it from multiple threads ... without par-map, this won't be happening any more, right? You're using the fibres to write to the files, only.

.. I'm not sure that you even need fibers, though. All that you really need is a callback-- whenever you get some new data, just call the callback, have it write to the file ... I guess that, sure, making the file writes async by using fibers might help a little bit ... but first, why not get the incremental write-to-file working, first?

Habush commented 4 years ago

You're using the fibres to write to the files, only... I'm not sure that you even need fibers, though. All that you really need is a callback-- whenever you get some new data, just call the callback, have it write to the file

Not only to write to files but also to send results to the atomese parser.

linas commented 4 years ago

Not only to write to files but also to send results to the atomese parser.

Sure, a callback to send results wherever. Again, you don't need fibers for this.

Somewhat unrelated, but .. why do you need an atomese parser? (or rather, what is an "atomese parser" and what does it do?)

linas commented 4 years ago

As mentioned elsewhere, https://github.com/opencog/atomspace/issues/2553 describes a way of getting rid of most mutexes in the atomspace.

Habush commented 4 years ago

Sure, a callback to send results wherever. Again, you don't need fibers for this.

I like the fibers approach because it offers performance as the writer and parser communicate over different threads (needs to benchmark to verify this though). But like you noted, the callback approach is more straight-forward.

Somewhat unrelated, but .. why do you need an atomese parser? (or rather, what is an "atomese parser" and what does it do?)

We use the atomese parser to output json for visualizing the results of the annotation. It basically converts atomese to json.

Habush commented 4 years ago

As mentioned elsewhere, opencog/atomspace#2553 describes a way of getting rid of most mutexes in the atomspace.

I would like to take a stab at this if you are going to oversee my work and give me feedback. But since this will be my first real contribution to the atomspace, it might take me more effort (but also I will learn more about the code).

linas commented 4 years ago

fibers

I think we are miscommunicating. It's impossible to use fibers without a callback. There has to be some way of calling the fiber code! There has to be a point in the existing code, where you will make changes to place a function to invoke the fiber code! So you're going to be installing those function calls, no matter what: its technically impossible to not have them! And once you have them .. well, anything is possible. So go for the simplest possible solution. In this case, the simplest possible solution is to just write to the file.

If the simplest possible solution is not good enough, try something fancier. But it is always an engineering mistake to do something complex, when something simple is enough.

rekado commented 4 years ago

I recommend the introduction on Fibers and the distinction from callback hell: https://github.com/wingo/fibers/wiki/Manual

It behaves as a simple blocking queue; mental overhead is less than callbacks and promises. It is a simple abstraction that's conceptually simpler than wrangling callbacks. It is not per se more complex.

My advice is generally to use the simplest abstraction that models the problem best, which may differ from what may appear to be the simplest solution.

linas commented 4 years ago

Still there is a misunderstanding. Placing an item onto a queue requires a function call. Why not just make a function call to just write the item out ? How can interposing a queue between the creator and the consumer ever possibly be simpler than simply doing the thing itself, directly? Of course, it can't. You are focing me to use very complicated words to say something extremely simple: just write to the file. Directly. Don't mess with extra complexity. In this particular case, I do not believe that the extra complexity provides any benefits whatsoever. It only makes the code harder to understand.

I mean -- in the end -- this is not my decision, this is your code, and you're on the hook to maintain it. I am advocating the proposition that code should always be as simple as possible to do what it needs to do. Extra complexity always results in performance degradation. Extra complexity always results in extra bugs. Extra complexity always makes the code hard to understand. Never have more complexity than absolutely needed.

Habush commented 4 years ago

Placing an item onto a queue requires a function call. Why not just make a function call to just write the item out ? How can interposing a queue between the creator and the consumer ever possibly be simpler than simply doing the thing itself, directly?

In this particular case, there are many producers (the annotation functions) and two consumers (the parser and the file writer), hence a single direct function call isn't enough. Two consumers should take the same input and do different things to that input (one parses it to JSON and the other just writes to file)

linas commented 4 years ago

OK ...

Habush commented 4 years ago

Closing this as the support for using channels was added in #208