Closed linas closed 4 years ago
@mjsduncan can explain, but self-interacting genes are not a bug. They even have a PubmedID
Ah, OK. I was looking at it as a problem in topology, and thought these were .. weird.
thanks for checking, linus! 2 or more copies of a protein can bind together to make a function unit.
@mjsduncan you are confusing me .. this is about gene interaction, not about proteins...
It's not that I'm checking .. I'm just fooling around with the data, and this one just pops out and hits one over the head. It alters the results significantly, many things come out quite different because of this. It's got a rather large effect on the data patterns.
except genes can only really interact via their expressed proteins or transcribed RNAs. but a gene that interacts with itself to self regulate expression and an "interaction" that is protein n-mer formation are two very different things. @hedra, can you post the pubmed id for an example self interacting gene?
@linas fyi this is the source of the interaction relations: https://thebiogrid.org/
Gene MAP2K4 interacts_with MAP2K4 https://www.ncbi.nlm.nih.gov/pubmed/?term=9162092
Hmm. OK. Well, if that's the definition of interacts_with
, then maybe there should also be a regulates
? I ask, because in the current datasets, about 3/5 genes have both A interacts_with B
and B interacts_with A
about about 2/5 have only one of these. So maybe it should have been regulates
, which is inherently non-symmetric? (This was the original point of #119)
so by @tanksha 's reference MAP2K4 can phosphorylate copies of itself, it's not regulating it's own expression. there is a regulates
among GO biological processes, but this is between sets of genes, not genes. i believe hedra is replacing the ListLink
s with SetLink
s to fix the missing reciprocal relationships you found. more complex relationships will be added soon...
OK. Thanks. Just (double) checking. I'm trying to do statistics on the whole network, and making it fully symmetric dramatically changes the connectivity of the network.
The current dataset contains 2945 self-interacting genes. i.e. genes in the form of
This is a dataset issue, not a codebase issue; I'm reporting it here because I don't know where else to report it.
I am using the code below to manually clean this up.