MOZI-AI / annotation-scheme

Human Gene annotation service backend
GNU General Public License v3.0
3 stars 4 forks source link

Remove the build-aux directory #80

Closed linas closed 4 years ago

linas commented 4 years ago

Per issue #76 this removes the (un-needed) build-aux directory.

Doing so exposed a domino chain of side effects: the build-aux directory in git contained an obsolete version of the test framework, which was causing issue #70 .. further pursuing this exposed a rather strange test-unit design ... anyway, after the changes in this pull req, all unit tests now pass, for me.

There are still two issues: there is some kind of module loading craziness; there's some kind of file-search path problem that remains. There's also oceans of warnings during the compile step.

rekado commented 4 years ago

FWIW with the custom fork of Guile JSON make check passes for me even without this patch set (or the referenced ones for that matter).

linas commented 4 years ago

I do not have any custom forks of guile-json. I am using just the stock guile-json, off-the-shelf, with no modifications. I am sorry I brought that up, I was confused last night, I thought that make-paramater was a part of json, but I was mistaken.

Habush commented 4 years ago

The tests fail when I run make check from the project root directory. This is due to the change of the path string from tests/sample_dataset.scm to ../tests/sample_dataset.scm. Please change this back.

rekado commented 4 years ago

I do not have any custom forks of guile-json

We are miscommunicating. This project depends on a fork of Guile JSON (as per the readme). The upstream Guile JSON release will not work here, because parts of this project (AFAIU just the parser) depend on the additional procedures made public in the fork.

I would advise against removing the existing test infrastructure. It works just fine as it is.

linas commented 4 years ago

Um, again -- I am not using any forks of guile-json. I am using the straight-up, unmodified version of guile-joson from whatever git repo it's located at. I can double check that if it's really important.

I experienced half-a-dozen problems when I tried to build this project. Recounting from memory: first .. it didn't build (in my environment, which is ubuntu-18.04 with guile-2.9) -- the configure.ac and makefile.am were explicitly checking for guile-2.2 and then failing. There was a pull req or two to fix this. Next was a failure of mis-detecting guile-3.0 and then claiming that guile-2.9.7 was too old. That got fixed in another pull req. Then there was the problem of env.in sending everything off to lala land. We discussed this one to death; I have not yet re-tested to see if that has actually fixed anything or not. Next issue was that simply running configure causes checked-in files to change. The git repo should not contain files that configure alters. That just makes code-maintenance ugly and nasty. After a while, it became clear that none of build-aux ever belonged in git; it was just junk-code. So the goal here is to rationalize that structure.

Again, its becoming increasingly clear that this project has accumulated vast quantities of https://en.wikipedia.org/wiki/Technical_debt -- this is the third time I'm posting this link to this article. Everyone working on this project should be familiar with the concepts in that wikipedia article. This project needs to pay down the technical debt it has accumulated. This is one pull req that helps do that.

I can explain it more directly: it took me a WEEK of long days, so -- more than 40 hours, to build this project, and then get the unit tests to build, and then to pass. That's just crazy. It should have worked out of the box, It should have taken 30 minutes. The unit tests should have passed on the first try. Simply building a project should not require brain-surgery-rocket-science and strong-guile-foo-zen-master skills to get it installed.

linas commented 4 years ago

Yeah, there is crazy-making due to the env.in file (see pull req #81) (but @rekado talked me out of)

Something is deeply screwed up with the way paths are handled in the build process. It took me three days, I could not get to the bottom of it. I suggest basically removing ALL of the build infrastructure, and just going for a super-minimalistic design, just keep everything simple, with no fancy macros, no fancy tools, no fancy makefiles. Just do the absolute minimum needed to get things to work.

rekado commented 4 years ago

it took me a WEEK of long days, so -- more than 40 hours, to build this project, and then get the unit tests to build, and then to pass. That's just crazy. It should have worked out of the box, It should have taken 30 minutes.

Just as a second data point from an outsider: this has not been my experience at all. Once I had packaged all dependencies it was a matter of 20 minutes to realize that I'm using the wrong Guile JSON and to replace it with the expected version. The test infrastructure really isn't at fault here --- it's the same as in many other Guile projects. Ripping out what is working and arguing about it just wastes time :-/

(I'd also like to recommend fixing the most obvious problems before embarking on performance optimization quests: a) mutation everywhere, b) non-idiomatic use of Scheme, c) use of global state.)

linas commented 4 years ago

Sorry. I am losing my temper. I'll try explaining it again, but to be clear, I am annoyed by the argumentation and the denial that there's a problem.

Ripping out what is working

It's not working. I am trying to promote a design philosophy called "minimalism": if something is not needed, it should be removed. There are many reasons why minimalism works, as a philosophy: it makes things smaller, quicker, faster. It makes things easier to understand. It makes things easier to modify, to update, to enhance, to grow. An accretion of gunk is just extra weight to be carried, it takes more effort, requires more work. Getting rid of the unneeded makes everything easier.

My wife has a story about complaining to a building superintendant that a lightbulb in the basement that didn't work. He looked at it and said "sure it works, you just have to jiggle it like so. Just don't touch this part or you'll get electrocuted." Don't be that guy. The test infrastructure is just plain broken and doesn't work. Just because other projects are wired that way is not an excuse: if some other buildings have bad electrical wiring, that's just dodging the point. We need to be fix the bad electrical wiring here.

wrong Guile JSON

As far as I can tell, nothing actually depends on JSON. It's a dependency that is not invoked in any code path that I've seen. It's possible that it is required by some other project that depends on this one. But the code in this repo simply does not need it.

a) mutation everywhere, b) non-idiomatic use of Scheme, c) use of global state.)

I agree that this is annoying, and makes the code more complicated and harder to understand. Its not at the root cause of any of the crashes, hangs or performance issues. (well, indirectly it is - code that is hard to understand becomes hard to use correctly, which explains the crashes -- none of the API's are documented, and some were expecting strings .. or atoms .. and the wrong object was passed in. The hang/blowing out the guile heap/Increase max sections was due to excessive list copying (now fixed). OK, yes, the excessive copying was due to "non-idiomatic scheme".

The correct way to tackle performance issues is to add instrumentation, and to focus on the actual hot-spots, and see what can be done there. I've squeezed out a 10x speedup on the biogrid annotation, from 26 hours to 2.5 hours, and a 3x speedup on the protein-pathway annotation, from 17 hours to 5 hours. I think I can get more.

rekado commented 4 years ago

I won't argue with you, so you might as well tone it down a little. There is no need to be condescending and to assume that everyone else is an idiot who doesn't understand what minimalism or technical debt is.

As far as I can tell, nothing actually depends on JSON. It's a dependency that is not invoked in any code path that I've seen.

You probably haven't seen parser.scm. then The code as is depends on Guile JSON. I don't think it should (and I proposed a change), but it does. Maybe you need to look again.

linas commented 4 years ago

parser.scm

Hmm. OK, just looked. That seems to be some kind of unrelated, orthogonal code that does something completely different; it's not actually involved in any of the search. That code is probably best moved to it's own repo, so that this repo is provides the search-services, only. That way, we can focus on performance, without getting tangled up in unrelated concerns. Having a distinct repo for handling format conversion to/from json formats would limit the dependency on json to only those tools that actually use json.

rekado commented 4 years ago

It's used to build up a graph with the records that are defined in that fork of Guile JSON. These records are used elsewhere in this project again (at the very least graph? is used in the tests).

In my opinion none of that's actually needed when a) the fork of Guile JSON is dropped, b) the parser is replaced with read and recursive pattern matching on the s-expression.

I don't think it should be moved to its own repo but should be removed entirely.

linas commented 4 years ago

Well, I'd like to have this repo be devoted to data-mining only; the json stuff is not required for data-mining. Ergo, its not needed in this repo. The graph stuff is used in a javascript visualizer; let it live with the visualizer. Separation of concerns: one API to do one thing, and that one thing well, instead of a grab-bag of unrelated functions.

Habush commented 4 years ago

I agree that the parser code should be moved into a separate module. Maybe in its own repo or be included in the python module

@rekado can you show a sample example using read or point me to a link?

rekado commented 4 years ago

@Habush better: I can show you a pull request. I've got some patches ready based on the simplify-parser branch. I'll clean them up a little and push them soon.

Using read is trivial:

(with-input-from-string "(this is an expression)" read)

This returns a list containing the symbols this, is, an, and expression. Now that you've got a proper Scheme data structure you can go and match on it.

(define (do-things expr)
  (define (convert-thing thing)
    (match thing
      ('this 'that)
      ('is 'was)
      ('an 'not)
      ('expression 'parsing)))
  (map convert-thing expr))

(define expression
  (with-input-from-string "(this is an expression)" read))

(do-things expression)

This is better than operating on strings because you can work at the higher expression level and thus trivially use recursion to convert the original data structure into whatever you need.

In your parser you're throwing away most of the information and accumulate state in a couple of variables to return a graph data structure. You can do this all in one place by accumulating state as you encounter symbols in the read expression.

tanksha commented 4 years ago

@linas could you do the changes requested so that I can merge this pull request? Tests are failing due to PR #78 and you have said "After this, the vast ocean of build-test failures suddenly disappeared for me".

linas commented 4 years ago

Sure. Of course, you can do this too, by saying

git checkout -b stuff
git pull https://github.com/linas/annotation-scheme predaux
change stuff
git commit
git checkout master
git merge stuff
git push origin master

Once you merge and push, this pull req would automatically close, even. :-)

tanksha commented 4 years ago

@linas but tests are still failing, did you try make check on after the change?

linas commented 4 years ago

What's the failure message?

There is still something very wrong with the path handling in the build system; so, for example, if I followi the instructions on the README, and, starting in the source directory, do ./configure; make; make check then the tests pass. But if instead I do mkdir build; cd build; ../configure; make; make check then the tests fail. If I use primitive-load-path, then the tests fail. Somehow, the load-path is being misconfigured; there's something wrong in the configure setup for this project.

rekado commented 4 years ago

primitive-load-path respects %load-path, which is set by the env script, which should be generated by the bootstrap phase of the build system. Since this project erroneously contains generated files like env you need to remember to also bootstrap the build system when working with a git checkout:

autoreconf -vif
./configure
make check

I need to say it again: the env.in serves its job here. The README should reflect the necessary bootstrap step, and all generated files (env. Makefile, configure, etc) should be removed from the repository. They will be generated when building a distribution tarball anyway (i.e. using make dist or make distcheck).

linas commented 4 years ago

@rekado, That's nice, except that doesn't actually work. You should actually try it yourself.

tanksha commented 4 years ago

What's the failure message?

There is still something very wrong with the path handling in the build system; so, for example, if I followi the instructions on the README, and, starting in the source directory, do ./configure; make; make check then the tests pass. But if instead I do mkdir build; cd build; ../configure; make; make check then the tests fail. If I use primitive-load-path, then the tests fail. Somehow, the load-path is being misconfigured; there's something wrong in the configure setup for this project.

I didnt create a build directory, rather I did the following

autoreconf -vif
./configure GUILE=$(which guile)
make
make check

Gives me the following error https://gist.github.com/tanksha/a7ebf2085338188197e2c3134d39aa61

rekado commented 4 years ago

@linas please specify what "doesn't actually work" means in your case. I used the method I described all the time in this project. Until recently the tests even passed ;) (See issue #116)

linas commented 4 years ago

I have never seen the unit tests pass, without having to hack the test infrastructure. I get two failure modes: in one case, I get file-not-found errors. When I try to hack around those failures by manually setting GUILE_LOAD_PATH=. make check then the tests spinning some kind of infinite loop. Just now, the util-tests.scm had been spinning for 96 minutes at 100% cpu because I forgot to look at it while reading email. I don't know why it spins; I imagine some kind of infinite-regress on the guile search path, maybe.

I do not currently see the error that @tanksha is posting the gist for. I did see it in the past, while hacking around, trying to get things to work.

I have never seen parser-tests fail. (despite using the 'incorrect' json)

At this time, with this pull req applied to the code base, all unit tests pass. Without this pull req, I get

Backtrace:
In ice-9/boot-9.scm:
  1722:10  7 (with-exception-handler _ _ #:unwind? _ # _)
In unknown file:
           6 (apply-smob/0 #<thunk 7f93ac1feac0>)
In ice-9/boot-9.scm:
    718:2  5 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
In ice-9/eval.scm:
    619:8  4 (_ #(#(#<directory (guile-user) 7f93abdbdf00>)))
In ice-9/boot-9.scm:
   2792:4  3 (save-module-excursion _)
  4336:12  2 (_)
In tests/util-tests.scm:
     13:0  1 (_)
In unknown file:
           0 (primitive-load "tests/sample_dataset.scm")

ERROR: In procedure primitive-load:
In procedure open-file: No such file or directory: "tests/sample_dataset.scm"
%%%% Starting test util  (Writing full log to "util.log")
FAIL tests/util-tests.scm (exit status: 1)
rekado commented 4 years ago

I have never seen parser-tests fail. (despite using the 'incorrect' json)

This indicates a serious error in your environment because Guile JSON does not provide any of the bindings that the parser code uses. I would be very wary of passing tests in that case. The parser tests, for example, use graph?, which does not exist in vanilla Guile JSON. You should be getting an unbound variable error when you're using the unmodified upstream Guile JSON.

rekado commented 4 years ago

I would like to recommend to not merge these changes to the test harness and build system, because your statement "I have never seen the unit tests pass, without having to hack the test infrastructure" indicates a different kind of error, as other people (myself included) have been able to run the test suite without any extensive hacking.

(I only had to remove one undefined macro from configure.ac before bootstrapping the build system.) I won't comment any further on this issue lest we repeat that cycle of insults and frustration; I have nothing more to add.

linas commented 4 years ago

Anyway, the point of this pull req was that the build-aux directory seems to contain files at are auto-generated by the build system, and after building, doing a git diff promptly reveals hundreds of lines of diffs, starting with the date-stamp -- in my case, install-sh is from 2017 not 2014, missing is newer also, and test-driver points to a non-existent location in my filesystem. Since these files are all copied into place, I see no reason why they should be checked into git. Removing them at least gives me a "clean" start, so that configure.ac can do what it needs to do, rather than getting tangled up with inconsistent file-paths.

Also, equally clearly, this is still not enough; there are still unresolved file-path issues. In particular, I find it infuriating that primitive-load works, but primitive-load-path fails. (and also the above-mentioned infinite-loop) There's still something fragile, somewhere, with the path handling. I don't know where, and I don't have the guile-fu to understand and fix it.

linas commented 4 years ago

Look, I'm just reporting what a "user off the street" is experiencing. I do a git pull, the system won't build. I had to hack around that, and then the unit tests wouldn't pass without hacking. This just gives bad impression right off the bat. When a new user just tries to download and install the system, these kinds of basic things should work.

You do not have to merge this pull req, but in that case, some other pull req is needed to fix things so that !!! the unit tests pass!!!

I cannot explain why the unit tests sometimes pass for you. I can explain that, after a LOT of hard work, I have been able to get them to pass, SOMETIMES. Again -- there is something broken about the build system being used here, but, despite investing a lot of time, I have been unable to isolate the root cause of the failures.

linas commented 4 years ago

The environment I'm using for these runs is a fresh, clean install of ubuntu 18.04, with nothing else installed except for guile built from git master, and the dependencies for this particular project.

linas commented 4 years ago

Perhaps the guile taken from git is causing the errors. I have taken to using guile built from scratch, from git master, because, historically, the default version of guile provided by Debian/Ubuntu has been buggy and unusable. This has been the recommended procedure for all users of opencog for maybe a decade now.

rekado commented 4 years ago

Perhaps the guile taken from git is causing the errors

This is not unlikely. Features and compatibility on Guile's master branch are fluctuating widely. You should stick to released versions. Guile 3 has been released yesterday (but I'd be surprised if annotation-scheme worked with Guile 3 out of the box); one of the 2.x stable releases should work though.

linas commented 4 years ago

Guile 3

I've been running guile-3 in production servers running the AtomSpace for about a year now, and have piled up maybe dozens of CPU-years of data-mining on it. Zero crashes, zero hangs. (Well, OK, there was one problem, but we worked around it. We have a server that can accept scheme code coming in on a socket, and evaluate it. So you could take a file, for example, sample_dataset.scm from the tests directory, and go cat sample_dataset.scm | nc localhost 17001 and it would get evaluated. The bug was that, if the file was more than 2-10 megabytes in size, then, after piping in hundreds of these files, over an hour or two, the guile-3 compiler would corrupt something and crash. The work-around was to not do it this way. The actual bug is (I think) some kind of race condition between copying the string, and having the compiler examine it, because short strings didn't exhibit this problem. In short, some half-dozen people, besides myself, have been using guile-3+atomspace for a year now, and its been rock-solid. As to annotation-scheme, I did all my runs on guile-3, and if you look at the final report #98, you will see some of those runs were 60 hours long. To summarize: guile -3 works great. It's the build infrastructure for this particular repo that doesn't work so well.

linas commented 4 years ago

Closing; this appears to be a source of confusion. The build-aux dir still needs to be removed, but this is best put off until the backlog of other, more important issues are resolved.

linas commented 4 years ago

There is an existing issue, #76 that is tracking this.