Closed linas closed 4 years ago
The node-info
function returns a list that is always of length one. Then append
is used to concatenate this list onto another list. If instead, it just returned a single atom, then cons
could be used, for a minor performance boost. Patched in #137
The add-loc
function contains this:
(AndLink
(MemberLink
child
parent)
The MemberLink is a constant, and does not alter search results. It can be removed. Turns out the correct fix was a ContextLink(!!) so this remark is not relevant.
The file UniProt2Reactome_PE_Pathway.txt.scm
contains structures like this:
(AndLink
(MemberLink
(MoleculeNode "Uniprot:A0A075B6P5")
(ConceptNode "R-HSA-166663"))
(EvaluationLink
(PredicateNode "has_location")
(ListLink
(MoleculeNode "Uniprot:A0A075B6P5")
(ConceptNode "extracellular region")))
)
I believe the AndLink serves no purpose whatsoever, and can be safely removed. This will make loading slightly faster. Comment no longer applies; has been replaced by ContextLink.
File loading performance could be increased by writing, at the start of the file
(define loc (PredicateNode "has_location"))
and then using that: (Evaluation loc (List ...))
Also doing this for each gene, pathway, protein would also speed things a little bit:
(define muniA0A075B6P5 (Molecule "Uniprot:A0A075B6P5"))
(define rhsa166663 (Concept "R-HSA-166663"))
By using such defines, you save a tiny amount of time to drop into the C++ code and atomspace, to perform the lookup. Also, the files get slightly smaller.
Closing since #137 was merged, and #150 tracks the file-enhancement idea
So, inrna.scm
I see this:(list (cog-outgoing-set ...
This is pointless, because the outgoing set is already a list Wrapping it in a list again does nothing at all except waste cpu time.That block of code would also run more efficiently if the if-tests were outside of themap
instead of inside of it. That way, the if-test wold be performed only once, instead of hundreds (or millions) of times.Comment no longer applies, the code is no longer structured like this.