MOZI-AI / annotation-scheme

Human Gene annotation service backend
GNU General Public License v3.0
3 stars 4 forks source link

Dataset issues - self-interaction #118

Closed linas closed 4 years ago

linas commented 4 years ago

The current dataset contains 2945 self-interacting genes. i.e. genes in the form of

(Evaluation
    (Predicate "interacts_with")
    (List (Gene "FOO") (Gene "FOO")))

This is a dataset issue, not a codebase issue; I'm reporting it here because I don't know where else to report it.

I am using the code below to manually clean this up.

(define (delete-self-interaction)
"
  Many genes are marked as interacting with themselves.
  Delete these, they screw up the topology of the searches.
"
   (define selfie-q
      (Get (List (Variable "$x") (Variable "$x"))))
   (define selfie-set (cog-execute! selfie-q))
   (define selfies (cog-outgoing-set selfie-set))
   (cog-delete selfie-set)
   (for-each
      (lambda (gene) (cog-delete-recursive (List gene gene)))
      selfies))
tanksha commented 4 years ago

@mjsduncan can explain, but self-interacting genes are not a bug. They even have a PubmedID

linas commented 4 years ago

Ah, OK. I was looking at it as a problem in topology, and thought these were .. weird.

mjsduncan commented 4 years ago

thanks for checking, linus! 2 or more copies of a protein can bind together to make a function unit.

linas commented 4 years ago

@mjsduncan you are confusing me .. this is about gene interaction, not about proteins...

It's not that I'm checking .. I'm just fooling around with the data, and this one just pops out and hits one over the head. It alters the results significantly, many things come out quite different because of this. It's got a rather large effect on the data patterns.

mjsduncan commented 4 years ago

except genes can only really interact via their expressed proteins or transcribed RNAs. but a gene that interacts with itself to self regulate expression and an "interaction" that is protein n-mer formation are two very different things. @hedra, can you post the pubmed id for an example self interacting gene?

mjsduncan commented 4 years ago

@linas fyi this is the source of the interaction relations: https://thebiogrid.org/

tanksha commented 4 years ago

Gene MAP2K4 interacts_with MAP2K4 https://www.ncbi.nlm.nih.gov/pubmed/?term=9162092

linas commented 4 years ago

Hmm. OK. Well, if that's the definition of interacts_with, then maybe there should also be a regulates ? I ask, because in the current datasets, about 3/5 genes have both A interacts_with B and B interacts_with A about about 2/5 have only one of these. So maybe it should have been regulates, which is inherently non-symmetric? (This was the original point of #119)

mjsduncan commented 4 years ago

so by @tanksha 's reference MAP2K4 can phosphorylate copies of itself, it's not regulating it's own expression. there is a regulates among GO biological processes, but this is between sets of genes, not genes. i believe hedra is replacing the ListLinks with SetLinks to fix the missing reciprocal relationships you found. more complex relationships will be added soon...

linas commented 4 years ago

OK. Thanks. Just (double) checking. I'm trying to do statistics on the whole network, and making it fully symmetric dramatically changes the connectivity of the network.