geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Fix redundant filenaming in YeastPathways models #188

Closed dustine32 closed 2 years ago

dustine32 commented 2 years ago

Short-term fix for YeastPathways GO-CAM models, we should solve the redundant file naming issue (e.g. TRESYN-PWY-TRESYN-PWY.ttl) as it occasionally causes conflicts with minerva's export function. We can discuss a more robust "conflict-avoidance" strategy for model IDs, filenames in another ticket. For now, I recorded the currently encoded logic below:

Current model ID logic

To get the model ID for a pathway, the code fetches the xref of its Pathway entity where xref's DB == Reactome or YeastCyc and then returns its ID portion (e.g. R-HSA-350562, TRESYN-PWY).

Current filenaming logic

The pathways2GO conversion code uses two different naming conventions depending on whether the input BioPAX option (-b) is a file or directory. Reactome input is a single file Homo_sapiens.owl. YeastCyc is a directory containing one file per pathway.

Input is single file (Reactome):

{Value of -o} + {model ID} + ".ttl"

Input file: -b biopax/Homo_sapiens.owl Output folder: -o reacto-out/ Given that an example model ID in the input file Homo_sapiens.owl is: R-HSA-350562 The final filename will be: reacto-out/R-HSA-350562.ttl

Input is directory (YeastCyc):

{Value of -o} + {input base filename, extension stripped} + "-" + {model ID} + ".ttl"

Input folder: -b pathways_biopax_files/ Output folder: -o yeastcyc-out/ Given that an example file in the input folder is: pathways_biopax_files/TRESYN-PWY.owl and that the model ID in this file is: TRESYN-PWY The final filename will be: yeastcyc-out/TRESYN-PWY-TRESYN-PWY.ttl

Short-term fix

Since there is a minerva export operation that generates files based only on their model IDs, I think we should aim to maintain a filename = {model_id} + ".ttl" rule for the import files. I'll fix the "input is directory" route by removing the {input base filename, extension stripped} + "-" segment. This should result in the correct filename yeastcyc-out/TRESYN-PWY.ttl.