MOZI-AI / annotation-scheme

Human Gene annotation service backend
GNU General Public License v3.0
3 stars 4 forks source link

General optimizations... #85

Closed linas closed 4 years ago

linas commented 4 years ago

So, in rna.scm I see this: (list (cog-outgoing-set ... This is pointless, because the outgoing set is already a list Wrapping it in a list again does nothing at all except waste cpu time.

That block of code would also run more efficiently if the if-tests were outside of the map instead of inside of it. That way, the if-test wold be performed only once, instead of hundreds (or millions) of times.

Comment no longer applies, the code is no longer structured like this.

linas commented 4 years ago

The node-info function returns a list that is always of length one. Then append is used to concatenate this list onto another list. If instead, it just returned a single atom, then cons could be used, for a minor performance boost. Patched in #137

linas commented 4 years ago

The add-loc function contains this:

          (AndLink
            (MemberLink
              child
              parent)

The MemberLink is a constant, and does not alter search results. It can be removed. Turns out the correct fix was a ContextLink(!!) so this remark is not relevant.

linas commented 4 years ago

The file UniProt2Reactome_PE_Pathway.txt.scm contains structures like this:

(AndLink
(MemberLink
 (MoleculeNode "Uniprot:A0A075B6P5")
(ConceptNode "R-HSA-166663"))

(EvaluationLink
 (PredicateNode "has_location")
 (ListLink
 (MoleculeNode "Uniprot:A0A075B6P5")
(ConceptNode "extracellular region")))

)

I believe the AndLink serves no purpose whatsoever, and can be safely removed. This will make loading slightly faster. Comment no longer applies; has been replaced by ContextLink.

linas commented 4 years ago

File loading performance could be increased by writing, at the start of the file

(define loc (PredicateNode "has_location"))

and then using that: (Evaluation loc (List ...))

Also doing this for each gene, pathway, protein would also speed things a little bit:

(define muniA0A075B6P5 (Molecule "Uniprot:A0A075B6P5"))
(define rhsa166663 (Concept "R-HSA-166663"))

By using such defines, you save a tiny amount of time to drop into the C++ code and atomspace, to perform the lookup. Also, the files get slightly smaller.

linas commented 4 years ago

Closing since #137 was merged, and #150 tracks the file-enhancement idea