geneontology / reactome-go-cams

A set of GO-CAMs built automatically from Reactome pathways.
1 stars 0 forks source link

Test converting a slice of Reactome BioPAX #3

Open dustine32 opened 3 years ago

dustine32 commented 3 years ago

@deustp01 URL for testing?

deustp01 commented 3 years ago

https://curator.reactome.org/ReactomeRESTfulAPI/RESTfulWS/biopaxExporter/Level3/15869

dustine32 commented 3 years ago

Thanks @deustp01! I'll run this BioBAX through the existing pathways2go converter and report back how it goes.

dustine32 commented 3 years ago

@deustp01 The above BioPAX created 13 pathways model files. Is this expected?

Here's the full log output:

1 of 13 Pathway:[Pyrimidine salvage]
defining pathway Pyrimidine salvage false true R-HSA-73614
Before sparql inference -  triples: 3185
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 191
Total evidence nodes 120
removed 0
After sparql inference -  triples: 2100
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   0   []
If enabler then MF rule 0   []
Occurs In Rule  10  []
Provides Input For Rule 0   []
Transporter Rule    0   []

writing....
writing n triples: 2100 models/R-HSA-73614.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 4318
reseting for next pathway...
2 of 13 Pathway:[Nucleotide catabolism]
defining pathway Nucleotide catabolism false true R-HSA-8956319
Before sparql inference -  triples: 504
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 34
Total evidence nodes 20
removed 1
After sparql inference -  triples: 416
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   1   [http://model.geneontology.org/R-HSA-8956319/R-HSA-8866601_GO_0009264_individual]
If enabler then MF rule 0   []
Occurs In Rule  1   []
Provides Input For Rule 0   []
Transporter Rule    0   []

writing....
writing n triples: 416 models/R-HSA-8956319.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 945
reseting for next pathway...
3 of 13 Pathway:[Nucleotide salvage]
defining pathway Nucleotide salvage false true R-HSA-8956321
Before sparql inference -  triples: 32
No occurs in
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 1
Total evidence nodes 0
removed 1
After sparql inference -  triples: 32
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   0   []
If enabler then MF rule 0   []
Occurs In Rule  0   []
Provides Input For Rule 0   []
Transporter Rule    0   []

writing....
writing n triples: 32 models/R-HSA-8956321.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 15
reseting for next pathway...
4 of 13 Pathway:[Purine salvage]
defining pathway Purine salvage false true R-HSA-74217
Before sparql inference -  triples: 4054
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 241
Total evidence nodes 152
removed 0
After sparql inference -  triples: 2647
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   0   []
If enabler then MF rule 0   []
Occurs In Rule  12  []
Provides Input For Rule 0   []
Transporter Rule    0   []

writing....
writing n triples: 2647 models/R-HSA-74217.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 5369
reseting for next pathway...
5 of 13 Pathway:[Metabolism of nucleotides, Nucleotide metabolism]
defining pathway Nucleotide metabolism false true R-HSA-15869
Before sparql inference -  triples: 30
No occurs in
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 1
Total evidence nodes 0
removed 1
After sparql inference -  triples: 30
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   0   []
If enabler then MF rule 0   []
Occurs In Rule  0   []
Provides Input For Rule 0   []
Transporter Rule    0   []

writing....
writing n triples: 30 models/R-HSA-15869.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 15
reseting for next pathway...
6 of 13 Pathway:[Pyrimidine biosynthesis]
defining pathway Pyrimidine biosynthesis false true R-HSA-500753
Before sparql inference -  triples: 2477
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 144
Total evidence nodes 91
removed 0
After sparql inference -  triples: 1602
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   0   []
If enabler then MF rule 0   []
Occurs In Rule  7   []
Provides Input For Rule 0   []
Transporter Rule    0   []

writing....
writing n triples: 1602 models/R-HSA-500753.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 3203
reseting for next pathway...
7 of 13 Pathway:[Interconversion of nucleotide di- and triphosphates]
defining pathway Interconversion of nucleotide di- and triphosphates false true R-HSA-499943
Before sparql inference -  triples: 11917
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 770
Total evidence nodes 482
removed 0
After sparql inference -  triples: 8462
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   8   [http://model.geneontology.org/R-HSA-499943/R-HSA-499943]
If enabler then MF rule 0   []
Occurs In Rule  34  []
Provides Input For Rule 0   []
Transporter Rule    0   []

writing....
writing n triples: 8462 models/R-HSA-499943.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 20790
reseting for next pathway...
8 of 13 Pathway:[Nucleotide biosynthesis]
defining pathway Nucleotide biosynthesis false true R-HSA-8956320
Before sparql inference -  triples: 32
No occurs in
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 1
Total evidence nodes 0
removed 1
After sparql inference -  triples: 32
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   0   []
If enabler then MF rule 0   []
Occurs In Rule  0   []
Provides Input For Rule 0   []
Transporter Rule    0   []

writing....
writing n triples: 32 models/R-HSA-8956320.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 15
reseting for next pathway...
9 of 13 Pathway:[Purine ribonucleoside monophosphate biosynthesis]
defining pathway Purine ribonucleoside monophosphate biosynthesis false true R-HSA-73817
Before sparql inference -  triples: 6685
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 451
Total evidence nodes 280
removed 0
After sparql inference -  triples: 5040
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   10  [http://model.geneontology.org/R-HSA-73817/R-HSA-73817]
If enabler then MF rule 0   []
Occurs In Rule  17  []
Provides Input For Rule 0   []
Transporter Rule    0   []

writing....
writing n triples: 5040 models/R-HSA-73817.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 12496
reseting for next pathway...
10 of 13 Pathway:[Phosphate bond hydrolysis by NTPDase proteins]
defining pathway Phosphate bond hydrolysis by NTPDase proteins false true R-HSA-8850843
Before sparql inference -  triples: 4429
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 265
Total evidence nodes 168
removed 0
After sparql inference -  triples: 2889
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   0   []
If enabler then MF rule 0   []
Occurs In Rule  12  []
Provides Input For Rule 0   []
Transporter Rule    0   []

writing....
writing n triples: 2889 models/R-HSA-8850843.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 5937
reseting for next pathway...
11 of 13 Pathway:[Purine catabolism]
defining pathway Purine catabolism false true R-HSA-74259
Before sparql inference -  triples: 5845
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 380
Total evidence nodes 237
removed 0
After sparql inference -  triples: 4215
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   5   [http://model.geneontology.org/R-HSA-74259/R-HSA-74259]
If enabler then MF rule 0   []
Occurs In Rule  16  []
Provides Input For Rule 0   []
Transporter Rule    0   []

writing....
writing n triples: 4215 models/R-HSA-74259.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 9867
reseting for next pathway...
12 of 13 Pathway:[Phosphate bond hydrolysis by NUDT proteins]
defining pathway Phosphate bond hydrolysis by NUDT proteins false true R-HSA-2393930
Before sparql inference -  triples: 6241
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 384
Total evidence nodes 242
removed 0
After sparql inference -  triples: 4201
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   2   [http://model.geneontology.org/R-HSA-2393930/R-HSA-2393930]
If enabler then MF rule 0   []
Occurs In Rule  17  []
Provides Input For Rule 0   []
Transporter Rule    0   []

writing....
writing n triples: 4201 models/R-HSA-2393930.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 9023
reseting for next pathway...
13 of 13 Pathway:[Pyrimidine catabolism]
defining pathway Pyrimidine catabolism false true R-HSA-73621
Before sparql inference -  triples: 5407
Starting delete locations
Eliminated 'located in' assertions
Starting unconnected node cleanup.  Total nodes 341
Total evidence nodes 213
removed 0
After sparql inference -  triples: 3758
Rule results:
Entity Regulation Rule 1.   0   []
Entity Regulation Rule 3    0   []
Entity Regulator Rule   1   [http://model.geneontology.org/R-HSA-73621/R-HSA-73621]
If enabler then MF rule 0   []
Occurs In Rule  17  []
Provides Input For Rule 0   []
Transporter Rule    4   [http://model.geneontology.org/R-HSA-73621/R-HSA-73621]

writing....
writing n triples: 3758 models/R-HSA-73621.ttl
done writing...
GO-CAM model is consistent, Total triples in validated model including tbox: 7906
reseting for next pathway...
done with file source/15869.owl
dustine32 commented 3 years ago

All 13 models were reported logically (OWL) consistent. And I think I confused myself into thinking the consistency checks included ShEx automatically in the pathways2go code. Apparently it is not checked, or I can't find where it's at in the code yet.

Either way I can just run the ShEx validator on these 13 models separately to get those reports.

dustine32 commented 3 years ago

@deustp01 The products folder here contains the ShEx reports. The first thing is that only 4 out of 13 models were shex_valid:

$ cut -f1,2,9 products/main_report.txt
filename    model_title shex_valid
R-HSA-73621.ttl Pyrimidine catabolism - imported from: Reactome false
R-HSA-74217.ttl Purine salvage - imported from: Reactome    false
R-HSA-74259.ttl Purine catabolism - imported from: Reactome false
R-HSA-73817.ttl Purine ribonucleoside monophosphate biosynthesis - imported from: Reactome  false
R-HSA-73614.ttl Pyrimidine salvage - imported from: Reactome    false
R-HSA-8956319.ttl   Nucleotide catabolism - imported from: Reactome false
R-HSA-500753.ttl    Pyrimidine biosynthesis - imported from: Reactome   false
R-HSA-8956320.ttl   Nucleotide biosynthesis - imported from: Reactome   true
R-HSA-15869.ttl Nucleotide metabolism - imported from: Reactome true
R-HSA-8956321.ttl   Nucleotide salvage - imported from: Reactome    true
R-HSA-2393930.ttl   Phosphate bond hydrolysis by NUDT proteins - imported from: Reactome    false
R-HSA-499943.ttl    Interconversion of nucleotide di- and triphosphates - imported from: Reactome   false
R-HSA-8850843.ttl   Phosphate bond hydrolysis by NTPDase proteins - imported from: Reactome false

I believe we can dig through products/explanations.txt to figure out what exactly is invalid about these models.

Tagging @kltm @ukemi @vanaukenk

ukemi commented 3 years ago

@deustp01 @dustine32 This is interesting. We need to have a look. Just for curiosity, I wonder if the ShEx issues are a result of what we did or are a result of changes to the Shex. Would it be worthwhile to do an experiment and run the Biopax of one of the models that hasn't changed. Peter, I will put this on the agenda for one of our meetings. PS. I find it a bit weird that Nucleotide Biosynthesis passes, but Pyrimidine biosynthesis doesn't. Shouldn't it be included in the Nucleotide Biosynthesis pathway?

ukemi commented 3 years ago

Curious: All seem to be anatomical entity violations. Looking at the last line: R-HSA-8850843.ttl Phosphate bond hydrolysis by NTPDase proteins - imported from: Reactome http://model.geneontology.org/R-HSA-8850843 gomodel:R-HSA-8851234 [GO:0017111] BFO:0000066 [obo:go/shapes/AnatomicalEntity] gomodel:reaction_R-HSA-8851234_location_lociGO_0000139 [GO:0000139] []

This reaction represents a nucleotide being hydrolyzed in the Golgi lumen. Not sure why this is failing: Here is a snippet of the ShEX

@ AND EXTRA a { a ( @ OR @ ) {1}; enabled_by: ( @ OR @ ) {0,1}; part_of: @ *; has_part: @ *; occurs_in: @ {0,1}; @ AND EXTRA a { a ( @ OR @ ); part_of: @ {0,1}; location_of: ( @ OR @ ) {0,1}; } // rdfs:comment "an anatomical entity" IRI @ AND EXTRA rdfs:subClassOf { rdfs:subClassOf [ GoAnatomicalEntity: ]; } GoAnatomicalEntity: Are we missing the GO_CC CARO bridge here somewhere?