RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
34 stars 9 forks source link

Error in DGIdb Conversion #370

Open acevedol opened 3 months ago

acevedol commented 3 months ago
Traceback (most recent call last):
  File "/home/ubuntu/kg2-code/dgidb_tsv_to_kg_jsonl.py", line 167, in <module>
    make_kg2_graph(input_file_name, nodes_output, edges_output, test_mode)
  File "/home/ubuntu/kg2-code/dgidb_tsv_to_kg_jsonl.py", line 83, in make_kg2_graph
    PMIDs] = fields
ValueError: too many values to unpack (expected 11)

The old fields in our conversion dgidb_tsv_to_kg_jsonl.py are

[gene_name,
             gene_claim_name,
             entrez_id,
             interaction_claim_source,
             interaction_types,
             drug_claim_name,
             drug_claim_primary_name,
             drug_name,
             drug_concept_id,
             _, #12.5.2020 new field in tsv: interaction group score
             PMIDs] = fields

but it looks like DGIdb changed their structure recently and now the available fields in interactions.tsv are:

gene_claim_name
gene_concept_id
gene_name
interaction_source_db_name
interaction_source_db_version
interaction_type
interaction_score
drug_claim_name
drug_concept_id
drug_name
approved
immunotherapy
anti_neoplastic
saramsey commented 3 months ago

Per Slack DM thread, the current plan is to use the May 2021 release of DGIdb for the purpose of the KG2.9.0pre build.

Then we'll update the DGIdb ETL script for RTX-KG2, so going forward in the subsequent builds, we can use the latest DGIdb release.