Open gaurav opened 11 months ago
@balhoff I've now added checks that (1) look for duplication between the local mappings file and generated predicate files, and (2) look for Biolink predicates that are not present in the Biolink model. So far, I'm just printing out concerning PredicateMappings (which is based on the predicate mappings file generated as part of the Biolink model), so unfortunately this isn't very readable. Here's what the output looks like right now with 15 warnings:
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- Found 15 mapping warnings:
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Generated predicate mapping file maps CTD:increases_secretion_of to multiple Biolink terms: List(PredicateMappingRow(Some(increases secretion of),Some(secretion),Some(increased),biolink:affects,Some(biolink:causes),Some(Set(CTD:increases_secretion_of)),None,None,None), PredicateMappingRow(Some(increases secretion of),Some(secretion),Some(increased),biolink:affects,Some(biolink:causes),Some(Set(CTD:increases_secretion_of)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Generated predicate mapping file maps CTD:increases_splicing_of to multiple Biolink terms: List(PredicateMappingRow(Some(increases splicing of),Some(splicing),Some(increased),biolink:affects,Some(biolink:causes),Some(Set(CTD:increases_splicing_of, CTD:increases_RNA_splicing)),None,None,None), PredicateMappingRow(Some(increases splicing of),Some(splicing),Some(increased),biolink:affects,Some(biolink:causes),Some(Set(CTD:increases_splicing_of)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Generated predicate mapping file maps CTD:affects_secretion_of to multiple Biolink terms: List(PredicateMappingRow(Some(affects secretion of),Some(secretion),None,biolink:affects,None,Some(Set(CTD:affects_secretion_of)),None,Some(Set(CTD:affects_export)),None), PredicateMappingRow(Some(affects secretion of),Some(secretion),None,biolink:affects,None,Some(Set(CTD:affects_secretion_of)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Generated predicate mapping file maps CTD:decreases_secretion_of to multiple Biolink terms: List(PredicateMappingRow(Some(decreases secretion of),Some(secretion),Some(decreased),biolink:affects,Some(biolink:causes),Some(Set(CTD:decreases_secretion_of)),None,None,None), PredicateMappingRow(Some(decreases secretion of),Some(secretion),Some(decreased),biolink:affects,Some(biolink:causes),Some(Set(CTD:decreases_secretion_of)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Generated predicate mapping file maps CTD:decreases_splicing_of to multiple Biolink terms: List(PredicateMappingRow(Some(decreases splicing of),Some(splicing),Some(decreased),biolink:affects,Some(biolink:causes),Some(Set(CTD:decreases_splicing_of, CTD:decreases_RNA_splicing)),None,None,None), PredicateMappingRow(Some(decreases splicing of),Some(splicing),Some(decreased),biolink:affects,Some(biolink:causes),Some(Set(CTD:decreases_splicing_of)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Generated predicate mapping file maps CTD:affects_splicing_of to multiple Biolink terms: List(PredicateMappingRow(Some(affects splicing of),Some(splicing),None,biolink:affects,None,Some(Set(CTD:affects_splicing_of)),None,None,None), PredicateMappingRow(Some(affects splicing of),Some(splicing),None,biolink:affects,None,Some(Set(CTD:affects_splicing_of)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Generated predicate mapping file maps RO:0002212 to multiple Biolink terms: List(PredicateMappingRow(Some(entity negatively regulates entity),None,Some(downregulated),biolink:regulates,None,Some(Set(RO:0002212, RO:0002449)),None,None,None), PredicateMappingRow(Some(process negatively regulates process),None,Some(downregulated),biolink:regulates,None,Some(Set(RO:0002212)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Combined predicate mappings maps RO:0002313 to multiple Biolink terms: List(PredicateMappingRow(None,None,None,biolink:affects,None,None,None,None,Some(Set(RO:0002313))), PredicateMappingRow(Some(increases transport of),Some(transport),Some(increased),biolink:affects,Some(biolink:causes),Some(Set(CTD:increases_transport_of)),None,None,Some(HashSet(RO:0002313, GAMMA:transporter, RO:0002340, GAMMA:carrier, RO:0002345))))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Combined predicate mappings maps CTD:increases_secretion_of to multiple Biolink terms: List(PredicateMappingRow(Some(increases secretion of),Some(secretion),Some(increased),biolink:affects,Some(biolink:causes),Some(Set(CTD:increases_secretion_of)),None,None,None), PredicateMappingRow(Some(increases secretion of),Some(secretion),Some(increased),biolink:affects,Some(biolink:causes),Some(Set(CTD:increases_secretion_of)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Combined predicate mappings maps CTD:increases_splicing_of to multiple Biolink terms: List(PredicateMappingRow(Some(increases splicing of),Some(splicing),Some(increased),biolink:affects,Some(biolink:causes),Some(Set(CTD:increases_splicing_of, CTD:increases_RNA_splicing)),None,None,None), PredicateMappingRow(Some(increases splicing of),Some(splicing),Some(increased),biolink:affects,Some(biolink:causes),Some(Set(CTD:increases_splicing_of)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Combined predicate mappings maps CTD:affects_secretion_of to multiple Biolink terms: List(PredicateMappingRow(Some(affects secretion of),Some(secretion),None,biolink:affects,None,Some(Set(CTD:affects_secretion_of)),None,Some(Set(CTD:affects_export)),None), PredicateMappingRow(Some(affects secretion of),Some(secretion),None,biolink:affects,None,Some(Set(CTD:affects_secretion_of)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Combined predicate mappings maps CTD:decreases_secretion_of to multiple Biolink terms: List(PredicateMappingRow(Some(decreases secretion of),Some(secretion),Some(decreased),biolink:affects,Some(biolink:causes),Some(Set(CTD:decreases_secretion_of)),None,None,None), PredicateMappingRow(Some(decreases secretion of),Some(secretion),Some(decreased),biolink:affects,Some(biolink:causes),Some(Set(CTD:decreases_secretion_of)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Combined predicate mappings maps CTD:decreases_splicing_of to multiple Biolink terms: List(PredicateMappingRow(Some(decreases splicing of),Some(splicing),Some(decreased),biolink:affects,Some(biolink:causes),Some(Set(CTD:decreases_splicing_of, CTD:decreases_RNA_splicing)),None,None,None), PredicateMappingRow(Some(decreases splicing of),Some(splicing),Some(decreased),biolink:affects,Some(biolink:causes),Some(Set(CTD:decreases_splicing_of)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Combined predicate mappings maps CTD:affects_splicing_of to multiple Biolink terms: List(PredicateMappingRow(Some(affects splicing of),Some(splicing),None,biolink:affects,None,Some(Set(CTD:affects_splicing_of)),None,None,None), PredicateMappingRow(Some(affects splicing of),Some(splicing),None,biolink:affects,None,Some(Set(CTD:affects_splicing_of)),None,None,None))
01:04:23.603 [zio-default-blocking-2] WARN generate_ro_biolink_mapping$ROBiolinkMappingsGenerator$ -- - Combined predicate mappings maps RO:0002212 to multiple Biolink terms: List(PredicateMappingRow(Some(entity negatively regulates entity),None,Some(downregulated),biolink:regulates,None,Some(Set(RO:0002212, RO:0002449)),None,None,None), PredicateMappingRow(Some(process negatively regulates process),None,Some(downregulated),biolink:regulates,None,Some(Set(RO:0002212)),None,None,None))
We can ignore the CTD mappings since we currently don't export those as all.
However, it looks like the following terms are duplicated:
I've deleted RO:0002313 from local mappings in 797ff28.
Hi @balhoff -- just wanted to poke you to review this PR. If you need help in incorporating it into the changes you've made to re-adding CTD, please let me know.
Hi @balhoff -- just wanted to poke you to review this PR. If you need help in incorporating it into the changes you've made to re-adding CTD, please let me know.
Adds
scripts/generate_ro_biolink_mapping.sc
, a Scala CLI script for generating a list of mappings between RDF predicates and Biolink predicates downloaded from two sources:These are written into the
ro-to-biolink-predicate-mappings.tsv
file (which I've included in this PR). If you want to see all the predicate mappings (not just the RO/GOREL ones), they are in thero-to-biolink-predicate-mappings-all.tsv
(https://github.com/ExposuresProvider/cam-pipeline/blob/e1d6dd063c43de31ac736dbd0ce1ee57008f64fc/ro-to-biolink-predicate-mappings-all.tsv).This file is then used by
scripts/kg_edges.dl
to add "qualifiers" tokg.tsv
. This does seem to work currently, producing output like:Things to do:
.asJson
from Circe to work. Help?ro-to-biolink-local-mappings.tsv
andro-to-biolink-predicate-mappings.tsv
-- any examples in the original list should be deleted so that only the qualified predicate is used.ro-to-biolink-local-mappings.tsv
for any predicates that have been deleted -- we can temporarily add those directly toscripts/generate_ro_biolink_mappings.sc
, but eventually we should get those into the Biolink model.This PR also adds the command for generating
ro-to-biolink-predicate-mappings.tsv
, although at the moment this will never be run, as the GitHub repo includes the predicate mappings file.WIP: will close #95 once implemented.