Open linas opened 4 years ago
i like the idea of data source specific nodes, then the node names could be the exact reference id that could be pasted to the url for that data source to programmatically access associated info.
i propose starting with:
ChEBINode
and UniProtNode
that inherit from MoleculeNode
and
ReactomeNode
and SMPNode
that inherit from a new PathwayNode
@mjsduncan same will be applied for others
GoNode
for Go terms, RNANode
for RNA transcripts, CLNode
for Cell types ?
@tanksha why not separate node types for each cell types instead of a single CLNode
type? Won't cells be matched by their type in the search functions?
@Habush CLNode is different, its for cell ontologies which is not in the bioAtomspace yet, Its like the GO ontologies (starts with GO:XXXXX) the cell ontologies or cell types (starts with CL:XXXXX)
@tanksha for the other types i suggest:
for RNAs there would be a refseqNode
and a ensemblNode
that inherit from moleculeNode
for ontology concepts there would be ontologyNode
with names "GO:xxxx" and "CL:yyyy"
this way the node names can complete the reference url for it's respective database.
@mjsduncan I'm suggesting distinct names, e.g. GoOntologyNode
and ClOntologyNode
so that a pointless string-search can be avoided during pattern search. Current search code looks to see if the first three bytes of the string are GO:
which is massively inefficient during searches. There is a proposal to implement a RegexNode
in the atomspace that would mostly fix this inefficiency; see opencog/atomspace#2474
Search performance could be improved by creating and using more biospecifc link and node types. For example:
This would allow quicker discovery of all pathways.
The above is instead of
Maybe even
for specific named spatial locations
Even simple nodes for different tags would help:
The above would solve the need for regex searches that are currently used to find these things. But also one could have a
RegexNode
as described in opencog/atomspace#2474As a general rule, any time one has a frequently-used EvaluationLink of the form
it would probably be an overall win to define a custom
(FooLink ... stuff...)
instead. Mostly this makes the atomspace a little smaller (fewer atoms) and pattern searches a little faster (less to explore). Whether or not this is worth it, I can't say. Maybe it would add extra complexity to other processing stages...