MOZI-AI / knowledge-import

Import scripts for the Bio-Atomspace
3 stars 6 forks source link

GO namespace is missing for some of the imported GO #30

Open leungmanhin opened 4 years ago

leungmanhin commented 4 years ago

Some GOs are not in any of the three namespaces (Molecular Function, Cellular Component, Biological Process)

To print out the list of GOs with their namespace missing in Guile:

(use-modules (opencog) (opencog exec) (opencog bioscience))

;  change the below paths as needed
(primitive-load "GO_2020-04-01.scm")
(primitive-load "GO_annotation_gene-level_2020-04-01.scm")
(primitive-load "Go-Plus-GO_2020-05-04.scm")

(for-each
  (lambda (go)
    (if (null? (cog-outgoing-set
          (cog-execute!
            (Get
              (TypedVariable (Variable "$x") (Type "ConceptNode"))
              (Evaluation (Predicate "GO_namespace") (List go (Variable "$x")))))))
      (format #t "~a\n" (cog-name go))))
  (filter
    (lambda (c)
      (string-prefix? "GO:" (cog-name c)))
    (cog-get-atoms 'ConceptNode)))
leungmanhin commented 4 years ago

I see that there are 44519 GOs in GO_2020-04-01.scm, and they all come with namespaces.

After loading GO_annotation_gene-level_2020-04-01.scm, 10 extra GOs are added:

GO:0005072
GO:0042623
GO:0030702
GO:0030617
GO:0030616
GO:0006343
GO:0070869
GO:0030618
GO:0004584
GO:0098740

After loading Go-Plus-GO_2020-05-04.scm, 30 more are added:

GO:0140453
GO:0062235
GO:0140458
GO:0140451
GO:0140449
GO:0062239
GO:0062237
GO:0106258
GO:0062240
GO:0140457
GO:0062236
GO:0140454
GO:0062241
GO:0140447
GO:0062242
GO:0140450
GO:0140455
GO:0062238
GO:0106257
GO:0062243
GO:0140460
GO:0062244
GO:0140459
GO:0062246
GO:0140456
GO:0106259
GO:0140448
GO:0062245
GO:0062247
GO:0140446

It's these extra ones that come from files other than the GO-<date>.scm has no namespace associated.