MOZI-AI / annotation-scheme

Human Gene annotation service backend
GNU General Public License v3.0
3 stars 4 forks source link

annotate-genes returning empty list of nodes and edges. #162

Closed Habush closed 4 years ago

Habush commented 4 years ago

After the latest improvements to the parser by @rekado, the annotate-genes function is returning empty list of nodes and edges. To reproduce this issue, load the sample_dataset.scm and run the annotation for IGF1 as follows:

(annotate-genes (list "IGF1") "test-res"  "[{\"function_name\": \"gene-pathway-annotation\", \"filters\": [{\"filter\": \"pathway\", \"value\": \"smpdb reactome\"},{\"filter\": \"include_prot\", \"value\": \"True\"}, {\"filter\": \"include_sm\", \"value\": \"False\"},{\"filter\": \"coding\", \"value\": \"True\"},{\"filter\": \"noncoding\", \"value\": \"True\"}, {\"filter\": \"biogrid\", \"value\": \"0\"}]}, {\"function_name\": \"gene-go-annotation\", \"filters\": [{\"filter\": \"namespace\", \"value\": \"biological_process cellular_component molecular_function\"}, {\"filter\": \"parents\", \"value\": \"0\"}, {\"filter\": \"protein\", \"value\": \"True\"}]}, {\"function_name\": \"include-rna\", \"filters\": [{\"filter\": \"coding\", \"value\": \"True\"},{\"filter\": \"noncoding\", \"value\": \"True\"},{\"filter\": \"protein\", \"value\": \"1\"}]},{\"function_name\": \"biogrid-interaction-annotation\", \"filters\": [{\"filter\": \"interaction\", \"value\": \"Proteins\"}]}]")

The above returns the following json:

"{\"nodes\":{},\"edges\":{}}"
rekado commented 4 years ago

This has been merged quite some time ago. Could you please add this case to the tests?

rekado commented 4 years ago

FWIW, I see 4 test failures right now, all of them in test/main-tests.scm where hash tables are unexpectedly returned.

There we zero test failures when my code was merged.

rekado commented 4 years ago

I cannot reproduce this. There's an earlier error:

scheme@(guile-user)> ,use(opencog)
scheme@(guile-user)> ,use (opencog bioscience)
scheme@(guile-user)> (primitive-load "tests/sample_dataset.scm")
$1 = (EvaluationLink
   (PredicateNode "has_pubmedID")
   (ListLink
      (EvaluationLink
         (PredicateNode "interacts_with")
         (ListLink
            (GeneNode "ALDOA")
            (GeneNode "ACTN4")
         )
      )
      (ListLink
         (ConceptNode "https://www.ncbi.nlm.nih.gov/pubmed/?term=26344197")
      )
   )
)

scheme@(guile-user)> ,use (annotation main)
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling /gnu/store/v59226wpnd7q22i1x8491b0hhj0zny58-profile/share/guile/site/2.2/opencog/exec.scm
;;; compiled /home/rekado/.cache/guile/ccache/2.2-LE-8-3.A/gnu/store/69ah320q8sbivnzxnf8sim9wv1g10v4k-atomspace-5.0.3-1.86c848d/share/guile/site/2.2/opencog/exec.scm.go
scheme@(guile-user)> (annotate-genes (list "IGF1") "test-res"  "[{\"function_name\": \"gene-pathway-annotation\", \"filters\": [{\"filter\": \"pathway\", \"value\": \"smpdb reactome\"},{\"filter\": \"include_prot\", \"value\": \"True\"}, {\"filter\": \"include_sm\", \"value\": \"False\"},{\"filter\": \"coding\", \"value\": \"True\"},{\"filter\": \"noncoding\", \"value\": \"True\"}, {\"filter\": \"biogrid\", \"value\": \"0\"}]}, {\"function_name\": \"gene-go-annotation\", \"filters\": [{\"filter\": \"namespace\", \"value\": \"biological_process cellular_component molecular_function\"}, {\"filter\": \"parents\", \"value\": \"0\"}, {\"filter\": \"protein\", \"value\": \"True\"}]}, {\"function_name\": \"include-rna\", \"filters\": [{\"filter\": \"coding\", \"value\": \"True\"},{\"filter\": \"noncoding\", \"value\": \"True\"},{\"filter\": \"protein\", \"value\": \"1\"}]},{\"function_name\": \"biogrid-interaction-annotation\", \"filters\": [{\"filter\": \"interaction\", \"value\": \"Proteins\"}]}]")
ERROR: In procedure scm-error:
vector-map: expected vector, got (#<hash-table 7ff9da35da60 2/31> #<hash-table 7ff9da3651a0 2/31> #<hash-table 7ff9da36ca00 2/31> #<hash-table 7ff9da374360 2/31>)

Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.
scheme@(guile-user) [1]> ,q
scheme@(guile-user)> 

I don't know if I did this right. Would be better to have this as an executable test.

Habush commented 4 years ago

@rekado Are you using the upstream guile-json library?

rekado commented 4 years ago

yes, of course. I'm doing this in a Guix environment.

Habush commented 4 years ago

It is the only reason that I can think of. B/c the earlier version of guile-json uses hash-table and the latest one uses vector.

rekado commented 4 years ago

ah, you switched from version 1.x to the latest version. The guix.scm file still refers to version 1.

rekado commented 4 years ago

Same for the Dockerfile. It still checks out version 1.2 of Guile JSON.

rekado commented 4 years ago

See https://github.com/MOZI-AI/annotation-scheme/pull/163

rekado commented 4 years ago

Still can't reproduce this:

$ [env] ./env guile
./env: line 2: cd: /opt: No such file or directory
./env: line 3: cd: /opt: No such file or directory
GNU Guile 2.2.6
Copyright (C) 1995-2019 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> ,use (annotation main)
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling /gnu/store/imax7z47mbbf6vsfcbwmq5qrw5x7y4s3-profile/share/guile/site/2.2/opencog/exec.scm
;;; compiled /home/rekado/.cache/guile/ccache/2.2-LE-8-3.A/gnu/store/j32mq6w5q99mlmpldx4jh9l92hbkjwdb-atomspace-5.0.3-1.86c848d/share/guile/site/2.2/opencog/exec.scm.go
;;; note: source file /home/rekado/dev/annotation-scheme/annotation/gene-go.scm
;;;       newer than compiled /home/rekado/dev/annotation-scheme/annotation/gene-go.go
;;; compiling /home/rekado/dev/annotation-scheme/annotation/gene-go.scm
;;; note: source file /home/rekado/dev/annotation-scheme/annotation/parser.scm
;;;       newer than compiled /home/rekado/dev/annotation-scheme/annotation/parser.go
;;; compiling /home/rekado/dev/annotation-scheme/annotation/parser.scm
;;; compiled /home/rekado/.cache/guile/ccache/2.2-LE-8-3.A/home/rekado/dev/annotation-scheme/annotation/parser.scm.go
;;; compiled /home/rekado/.cache/guile/ccache/2.2-LE-8-3.A/home/rekado/dev/annotation-scheme/annotation/gene-go.scm.go
scheme@(guile-user)> ,use (opencog) (opencog bioscience)
scheme@(guile-user)> (primitive-load "tests/sample_dataset.scm")
$1 = (EvaluationLink
   (PredicateNode "has_pubmedID")
   (ListLink
      (EvaluationLink
         (PredicateNode "interacts_with")
         (ListLink
            (GeneNode "ALDOA")
            (GeneNode "ACTN4")
         )
      )
      (ListLink
         (ConceptNode "https://www.ncbi.nlm.nih.gov/pubmed/?term=26344197")
      )
   )
)

scheme@(guile-user)> (annotate-genes (list "IGF1") "test-res"  "[{\"function_name\": \"gene-pathway-annotation\", \"filters\": [{\"filter\": \"pathway\", \"value\": \"smpdb reactome\"},{\"filter\": \"include_prot\", \"value\": \"True\"}, {\"filter\": \"include_sm\", \"value\": \"False\"},{\"filter\": \"coding\", \"value\": \"True\"},{\"filter\": \"noncoding\", \"value\": \"True\"}, {\"filter\": \"biogrid\", \"value\": \"0\"}]}, {\"function_name\": \"gene-go-annotation\", \"filters\": [{\"filter\": \"namespace\", \"value\": \"biological_process cellular_component molecular_function\"}, {\"filter\": \"parents\", \"value\": \"0\"}, {\"filter\": \"protein\", \"value\": \"True\"}]}, {\"function_name\": \"include-rna\", \"filters\": [{\"filter\": \"coding\", \"value\": \"True\"},{\"filter\": \"noncoding\", \"value\": \"True\"},{\"filter\": \"protein\", \"value\": \"1\"}]},{\"function_name\": \"biogrid-interaction-annotation\", \"filters\": [{\"filter\": \"interaction\", \"value\": \"Proteins\"}]}]")
annotation/main.scm:95:22: In procedure variable-ref: Not a variable: #f

Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.
scheme@(guile-user) [1]> ,q
scheme@(guile-user)> 

Please add executable tests for this. We won't get far like this.

Habush commented 4 years ago

@rekado please import this modules first,

(use-modules 
    (opencog) (opencog exec)
    (annotation) (annotation main) 
    (annotation gene-go) (annotation biogrid) (annotation gene-pathway)
    (annotation functions) (annotation util) (annotation rna)
    (annotation parser)
)

I want fix the module-loading issue but currently I don't how to search for a symbol in a specific module so that's why I am resorting to current-module

Habush commented 4 years ago

Btw when running annotate-genes it prints the atomese result into stdout before returning the JSON. And I can't seem to find where it is coming from

rekado commented 4 years ago

I can prepare a patch for looking up variables only in the specified module. If you want to take a stab at it yourself, please take a look at 6.20.8 Module System Reflection in the manual. resolve-interface is a good starting point.

Habush commented 4 years ago

okay, thanks for the direction. I will work on PR and will ask you to review it.

rekado commented 4 years ago

I still can't reproduce this, but I do get a different kind of error. When I run this, the unknown case of the expr->graph procedure is triggered --- and then it spits out the unmatched expression.

The unmatched expression is a plain list of ListLink elements. The parser does not support that. It expects the data structure to start with one of the Atomspace types such as Predicate, Member, Evaluation, List, ListLink, etc.

So this would be okay:

'(List (ListLink ...) (ListLink ...))

But we get this:

'((ListLink ...) (ListLink ...) ...)

Why do we get a list of atomspace expressions and not just one atomspace expression?

rekado commented 4 years ago

Btw when running annotate-genes it prints the atomese result into stdout before returning the JSON. And I can't seem to find where it is coming from

This is the pk call in the unknown case, which has a comment above it that says: "This shouldn't happen".

It spits out the unmatched expression as a courtesy to the debugging person :) (I'll make it pretty print this in a PR.)

Habush commented 4 years ago

The reason it returns '((ListLink ...) (ListLink ...) ...) is because the output comes from running map on the request.

So the parser should handle this case.

Habush commented 4 years ago

I tested it with the following input and it still fails:

(ListLink
   (ConceptNode "main")
   (ListLink
      (EvaluationLink
         (PredicateNode "has_name")
         (ListLink
            (GeneNode "IGF1")
            (ConceptNode "Insulin-like growth factor I")
         )
      )
      (ListLink
         (EvaluationLink
            (PredicateNode "has_location")
            (ListLink
               (GeneNode "IGF1")
               (ConceptNode "exocytic vesicle")
            )
         )
         (EvaluationLink (stv 1 1)
            (PredicateNode "has_location")
            (ListLink
               (GeneNode "IGF1")
               (ConceptNode "platelet alpha granule lumen")
            )
         )
      )
   )
)
rekado commented 4 years ago

How can I reproduce this? To what procedure is this the input?

rekado commented 4 years ago

The reason it returns '((ListLink ...) (ListLink ...) ...) is because the output comes from running map on the request.

So the parser should handle this case.

Then you should also map the parser. Is it known when the result is a single atomspace expression and when it is a list of atomspace expressions?

Habush commented 4 years ago

How can I reproduce this? To what procedure is this the input?

scheme@(guile-user)> (define input (ListLink
   (ConceptNode "main")
   (ListLink
      (EvaluationLink
         (PredicateNode "has_name")
         (ListLink
            (GeneNode "IGF1")
            (ConceptNode "Insulin-like growth factor I")
         )
      )
      (ListLink
         (EvaluationLink
            (PredicateNode "has_location")
            (ListLink
               (GeneNode "IGF1")
               (ConceptNode "exocytic vesicle")
            )
         )
         (EvaluationLink (stv 1 1)
            (PredicateNode "has_location")
            (ListLink
               (GeneNode "IGF1")
               (ConceptNode "platelet alpha granule lumen")
            )
         )
      )
 )
scheme@(guile-user)> (atomese-parser input)
rekado commented 4 years ago

atomese-parser currently takes a string as its input, because it hasn't been changed to read directly from a port yet.

So it needs to be (atomese-parser (format #f "~a" input)).

Habush commented 4 years ago

Oops, I should have mentioned that I have removed the string conversion and now the atomese-parser accepts s-expr. This change is local and I haven't pushed it yet.

Habush commented 4 years ago

I think the pattern matching doesn't ignore whitespaces and newlines.

rekado commented 4 years ago

Can you show your diff?

Habush commented 4 years ago

Please check #171