RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
39 stars 8 forks source link

Reactome Conversion Script Fails #297

Closed ecwood closed 1 year ago

ecwood commented 1 year ago

While testing for #296, I found that reactome_mysql_to_kg_json.py fails:

/home/ubuntu/kg2-venv/lib/python3.7/site-packages/rdflib_jsonld/__init__.py:12: DeprecationWarning: The rdflib-jsonld package has been integrated into rdflib as of rdflib==6.0.0.  Please remove rdflib-jsonld from your project's dependencies.
  DeprecationWarning,
Traceback (most recent call last):
  File "reactome_mysql_to_kg_json.py", line 954, in <module>
    nodes = get_nodes(connection, args.test)
  File "reactome_mysql_to_kg_json.py", line 318, in get_nodes
    for result in run_sql(nodes_sql, connection):
  File "reactome_mysql_to_kg_json.py", line 62, in run_sql
    cursor.execute(sql)
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/cursors.py", line 170, in execute
    result = self._query(query)
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/connections.py", line 517, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/connections.py", line 732, in _read_query_result
    result.read()
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/connections.py", line 1075, in read
    first_packet = self.connection._read_packet()
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/connections.py", line 684, in _read_packet
    packet.check_error()
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.ProgrammingError: (1146, "Table 'reactome.stableidentifier' doesn't exist")

It's likely that the schema for Reactome has changed.

ecwood commented 1 year ago

I reverted back to 39cf53b, since the commits since then seem to correspond with table names that have been reverted back to what they were originally. However, this still failed:

Traceback (most recent call last):
  File "reactome_mysql_to_kg_json.py", line 955, in <module>
    edges = get_edges(connection, args.test)
  File "reactome_mysql_to_kg_json.py", line 935, in get_edges
    for edge in get_event_characteristics(connection, test):
  File "reactome_mysql_to_kg_json.py", line 625, in get_event_characteristics
    citation_list = get_author_of_PMID(publication, connection)
  File "reactome_mysql_to_kg_json.py", line 479, in get_author_of_PMID
    results = run_sql(sql, connection)
  File "reactome_mysql_to_kg_json.py", line 62, in run_sql
    cursor.execute(sql)
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/cursors.py", line 170, in execute
    result = self._query(query)
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/connections.py", line 517, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/connections.py", line 732, in _read_query_result
    result.read()
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/connections.py", line 1075, in read
    first_packet = self.connection._read_packet()
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/connections.py", line 684, in _read_packet
    packet.check_error()
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1")

because there is no check that the PMID is not empty. This was the violating query:

SELECT per.surname, lr.year            FROM LiteratureReference lr            INNER JOIN Publication_2_author pub_auth            ON pub_auth.DB_ID=lr.DB_ID            INNER JOIN Person per            ON per.DB_ID=pub_auth.author            WHERE lr.pubMedIdentifier=

This is caused by the regulation query not returning any PubMed IDs.

This is, in turn, caused by the fact that Regulation_2_summation is now an empty table. I still need to determine if the content has been moved elsewhere.

ecwood commented 1 year ago

I contacted Reactome about this difference. This was their response, from Guanming Wu:

We are planning to retire the summation attribute in the Regulation class. Summation objects annotated in Regulations have been migrated to ReactionlikeEvents those Regulations regulate to consolidate annotations around ReactionlikeEvent objects. The actual table used is Event_2_summation since Event is a super class to ReactionlikeEvent.

ecwood commented 1 year ago

With 778ac1e and 1503bd6, the script seems to be working again. Here is an example edge with the publications information working:

        {
            "id": "REACT:R-ALL-112275---REACT:positively_regulates---None---None---None---REACT:R-HSA-622390---identifiers_org_registry:reactome",
            "negated": false,
            "object": "REACT:R-HSA-622390",
            "predicate": null,
            "primary_knowledge_source": "identifiers_org_registry:reactome",
            "publications": [
                "PMID:12857742",
                "PMID:18471901"
            ],
            "publications_info": {
                "PMID:12857742": {
                    "sentence": "Netrin-1, through its activation of DCC, triggers TRPC channel mediating the Ca+2 influx that is required for the growth cone turning. The effect of netrin-1 on TRP currents in the neurons is studied in Xenopus. In cultured Xenopus spinal neurons, Netrin-1 evoked Ca+2 influx and a depolarizing, TRPC-like current in both soma and growth cones. Inhibition of the Xenopus homologue of mammalian TRPC1 (XTRPC1) prevented Ca+2 influx, TRPC-like current activation and the chemotropic turning of the growth cone in response to a gradient of Netrin-1.<br>Netrin-1 receptor signalling to TRPC channels is mediated via hydrolysis of PIP2 by PLCgamma which then activates TRPC channel activity through IP3 and DAG."
                },
                "PMID:18471901": {
                    "sentence": "Netrin-1, through its activation of DCC, triggers TRPC channel mediating the Ca+2 influx that is required for the growth cone turning. The effect of netrin-1 on TRP currents in the neurons is studied in Xenopus. In cultured Xenopus spinal neurons, Netrin-1 evoked Ca+2 influx and a depolarizing, TRPC-like current in both soma and growth cones. Inhibition of the Xenopus homologue of mammalian TRPC1 (XTRPC1) prevented Ca+2 influx, TRPC-like current activation and the chemotropic turning of the growth cone in response to a gradient of Netrin-1.<br>Netrin-1 receptor signalling to TRPC channels is mediated via hydrolysis of PIP2 by PLCgamma which then activates TRPC channel activity through IP3 and DAG."
                }
            },
            "qualified_object_aspect": null,
            "qualified_object_direction": null,
            "qualified_predicate": null,
            "relation_label": "positively_regulates",
            "source_predicate": "REACT:positively_regulates",
            "subject": "REACT:R-ALL-112275",
            "update_date": null
        },
ecwood commented 1 year ago

I am closing this issue because the code worked in KG2.8.4pre's build.