Short-term fix for YeastPathways GO-CAM models, we should solve the redundant file naming issue (e.g. TRESYN-PWY-TRESYN-PWY.ttl) as it occasionally causes conflicts with minerva's export function. We can discuss a more robust "conflict-avoidance" strategy for model IDs, filenames in another ticket. For now, I recorded the currently encoded logic below:
Current model ID logic
To get the model ID for a pathway, the code fetches the xref of its Pathway entity where xref's DB == Reactome or YeastCyc and then returns its ID portion (e.g. R-HSA-350562, TRESYN-PWY).
Current filenaming logic
The pathways2GO conversion code uses two different naming conventions depending on whether the input BioPAX option (-b) is a file or directory. Reactome input is a single file Homo_sapiens.owl. YeastCyc is a directory containing one file per pathway.
Input is single file (Reactome):
{Value of -o} + {model ID} + ".ttl"
Input file: -b biopax/Homo_sapiens.owl
Output folder: -o reacto-out/
Given that an example model ID in the input file Homo_sapiens.owl is: R-HSA-350562
The final filename will be: reacto-out/R-HSA-350562.ttl
Input is directory (YeastCyc):
{Value of -o} + {input base filename, extension stripped} + "-" + {model ID} + ".ttl"
Input folder: -b pathways_biopax_files/
Output folder: -o yeastcyc-out/
Given that an example file in the input folder is: pathways_biopax_files/TRESYN-PWY.owland that the model ID in this file is: TRESYN-PWY
The final filename will be: yeastcyc-out/TRESYN-PWY-TRESYN-PWY.ttl
Short-term fix
Since there is a minerva export operation that generates files based only on their model IDs, I think we should aim to maintain a filename = {model_id} + ".ttl" rule for the import files. I'll fix the "input is directory" route by removing the {input base filename, extension stripped} + "-" segment. This should result in the correct filename yeastcyc-out/TRESYN-PWY.ttl.
Short-term fix for YeastPathways GO-CAM models, we should solve the redundant file naming issue (e.g.
TRESYN-PWY-TRESYN-PWY.ttl
) as it occasionally causes conflicts with minerva's export function. We can discuss a more robust "conflict-avoidance" strategy for model IDs, filenames in another ticket. For now, I recorded the currently encoded logic below:Current model ID logic
To get the model ID for a pathway, the code fetches the xref of its
Pathway
entity where xref's DB ==Reactome
orYeastCyc
and then returns its ID portion (e.g.R-HSA-350562
,TRESYN-PWY
).Current filenaming logic
The pathways2GO conversion code uses two different naming conventions depending on whether the input BioPAX option (
-b
) is a file or directory. Reactome input is a single fileHomo_sapiens.owl
. YeastCyc is a directory containing one file per pathway.Input is single file (Reactome):
Input file:
-b biopax/Homo_sapiens.owl
Output folder:-o reacto-out/
Given that an example model ID in the input fileHomo_sapiens.owl
is:R-HSA-350562
The final filename will be:reacto-out/R-HSA-350562.ttl
Input is directory (YeastCyc):
Input folder:
-b pathways_biopax_files/
Output folder:-o yeastcyc-out/
Given that an example file in the input folder is:pathways_biopax_files/TRESYN-PWY.owl
and that the model ID in this file is:TRESYN-PWY
The final filename will be:yeastcyc-out/TRESYN-PWY-TRESYN-PWY.ttl
Short-term fix
Since there is a minerva export operation that generates files based only on their model IDs, I think we should aim to maintain a
filename = {model_id} + ".ttl"
rule for the import files. I'll fix the "input is directory" route by removing the{input base filename, extension stripped} + "-"
segment. This should result in the correct filenameyeastcyc-out/TRESYN-PWY.ttl
.