callahantiff / PheKnowLator

PheKnowLator: Heterogeneous Biomedical Knowledge Graphs and Benchmarks Constructed Under Alternative Semantic Models
https://github.com/callahantiff/PheKnowLator/wiki
Apache License 2.0
157 stars 29 forks source link

Enabling Google GCS Directory Listing #91

Closed callahantiff closed 3 years ago

callahantiff commented 3 years ago

Task

Add functionality the enable the listing of objects within a Google GCS bucket. There are some interesting solutions proposed on the web to account for the lack of native functionality.

Potential Solution

One such solution essentially creates a listing of the items in a bucket and exposes it as a web service. I'm not sure that this is the best way to go, @LucaCappelletti94, if I were to implement something similar to this would that meet your requirements? Can you describe to me a bit more what functionality you are looking for and I will make sure to add it? 😄 🤔 💭

Thank you!

LucaCappelletti94 commented 3 years ago

Hello @callahantiff, sorry for the delay! I believe something like that may work, but if it takes more than an hour (and in my experience configuring stuff that is not completely straightforward in Google Cloud tend to combust at some point spontaneously) possibly, we could hard-code some reference URL(s) for the latest version of Pheknowlator into a metadata JSON within Ensmallen that can be updated as necessary.

Would this be a (low-time requirement) sensible solution? Indeed it does not allow automatic retrieval of newly deployed graphs unless they replace the old one at the previous URLs.

LucaCappelletti94 commented 3 years ago

One such example is how I am currently handling the kg-hub graphs since they still have some oddities.

callahantiff commented 3 years ago

Thanks so much @LucaCappelletti94. This sounds totally reasonable. I will take a closer look this weekend and follow-up once I have a better sense. Thank you!

LucaCappelletti94 commented 3 years ago

Hello @callahantiff, I am doing a round of updates to the graph retrieval, any news on the PheKnowLator graph availability? I saw it is offered in owl, but we do not currently support the OWL format.

callahantiff commented 3 years ago

Hey @LucaCappelletti94 -

Thanks for circling back with me. We would love to be included and we do provide data in a format other than OWL. Information on all of the output files produced, including a snippet of the output, can be found here: https://github.com/callahantiff/PheKnowLator/wiki/KG-Construction#table-knowledge-graph-build-output. Per our conversation on Slack, I think the files for each build that will work the best and be the easiest to use with existing infrastructure will be the XXXX_Triples_Identifiers.txt.

There are two other updates that I wanted to give you below.

Update One: PheKnowLator GCS Bucket Directory Listing Solution

There is a JSON file called pheknowlator_builds.json that can serve as a proxy for a directory listing. It gets updated each month and can be accessed from the following URL: https://storage.googleapis.com/pheknowlator/pheknowlator_builds.json.

Currently, there is an entry for metadata, and then one for each monthly build/release (ordered temporally), where each kg build is referenced by key within each monthly build (hope that makes sense -- additional information on how the 12 builds differ is shown below). Note that if a particular file was not available for a monthly build/release, it will be noted with the value null. A snippet of the output is shown below. Let me know if you see any problems with this structure, it's very easy to make changes!

{
    "metadata": "For more information on the PheKnowLator Builds, please visit the project GitHub: https://github.com/callahantiff/PheKnowLator",
    "v2.0.0-2020-5-10": {
        "instance-inverseRelations-owl": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_10MAY2020/knowledge_graphs/instance_builds/inverse_relations/owlnets/PheKnowLator_v2.0.0_full_Instance_inverseRelations_noOWL_Triples_Identifiers.txt",
        "instance-inverseRelations-owlnets": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_10MAY2020/knowledge_graphs/instance_builds/inverse_relations/owlnets/PheKnowLator_v2.0.0_full_Instance_inverseRelations_noOWL_Triples_Identifiers.txt",
        "instance-inverseRelations-owlnets-purified": null,
        "instance-relationsOnly-owl": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_10MAY2020/knowledge_graphs/instance_builds/relations_only/owlnets/PheKnowLator_v2.0.0_full_Instance_relationsOnly_noOWL_Triples_Identifiers.txt",
        "instance-relationsOnly-owlnets": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_10MAY2020/knowledge_graphs/instance_builds/relations_only/owlnets/PheKnowLator_v2.0.0_full_Instance_relationsOnly_noOWL_Triples_Identifiers.txt",
        "instance-relationsOnly-owlnets-purified": null,
        "subclass-inverseRelations-owl": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_10MAY2020/knowledge_graphs/subclass_builds/inverse_relations/owlnets/PheKnowLator_v2.0.0_full_subclass_inverseRelations_noOWL_Triples_Identifiers.txt",
        "subclass-inverseRelations-owlnets": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_10MAY2020/knowledge_graphs/subclass_builds/inverse_relations/owlnets/PheKnowLator_v2.0.0_full_subclass_inverseRelations_noOWL_Triples_Identifiers.txt",
        "subclass-inverseRelations-owlnets-purified": null,
        "subclass-relationsOnly-owl": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_10MAY2020/knowledge_graphs/subclass_builds/relations_only/owlnets/PheKnowLator_v2.0.0_full_subclass_relationsOnly_noOWL_Triples_Identifiers.txt",
        "subclass-relationsOnly-owlnets": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_10MAY2020/knowledge_graphs/subclass_builds/relations_only/owlnets/PheKnowLator_v2.0.0_full_subclass_relationsOnly_noOWL_Triples_Identifiers.txt",
        "subclass-relationsOnly-owlnets-purified": null
    },
    "v2.0.0-2021-1-25": {
        "instance-inverseRelations-owl": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/instance_builds/inverse_relations/owlnets/PheKnowLator_v2.0.0_full_instance_inverseRelations_noOWL_Triples_Identifiers.txt",
        "instance-inverseRelations-owlnets": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/instance_builds/inverse_relations/owlnets/PheKnowLator_v2.0.0_full_instance_inverseRelations_noOWL_Triples_Identifiers.txt",
        "instance-inverseRelations-owlnets-purified": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/instance_builds/inverse_relations/owlnets/PheKnowLator_v2.0.0_full_instance_inverseRelations_noOWL_INSTANCE_purified_Triples_Identifiers.txt",
        "instance-relationsOnly-owl": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/instance_builds/relations_only/owlnets/PheKnowLator_v2.0.0_full_instance_relationsOnly_noOWL_Triples_Identifiers.txt",
        "instance-relationsOnly-owlnets": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/instance_builds/relations_only/owlnets/PheKnowLator_v2.0.0_full_instance_relationsOnly_noOWL_Triples_Identifiers.txt",
        "instance-relationsOnly-owlnets-purified": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/instance_builds/relations_only/owlnets/PheKnowLator_v2.0.0_full_instance_relationsOnly_noOWL_INSTANCE_purified_Triples_Identifiers.txt",
        "subclass-inverseRelations-owl": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/subclass_builds/inverse_relations/owlnets/PheKnowLator_v2.0.0_full_subclass_inverseRelations_noOWL_Triples_Identifiers.txt",
        "subclass-inverseRelations-owlnets": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/subclass_builds/inverse_relations/owlnets/PheKnowLator_v2.0.0_full_subclass_inverseRelations_noOWL_Triples_Identifiers.txt",
        "subclass-inverseRelations-owlnets-purified": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/subclass_builds/inverse_relations/owlnets/PheKnowLator_v2.0.0_full_subclass_inverseRelations_noOWL_SUBCLASS_purified_Triples_Identifiers.txt",
        "subclass-relationsOnly-owl": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/subclass_builds/relations_only/owlnets/PheKnowLator_v2.0.0_full_subclass_relationsOnly_noOWL_Triples_Identifiers.txt",
        "subclass-relationsOnly-owlnets": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/subclass_builds/relations_only/owlnets/PheKnowLator_v2.0.0_full_subclass_relationsOnly_noOWL_Triples_Identifiers.txt",
        "subclass-relationsOnly-owlnets-purified": "https://storage.googleapis.com/pheknowlator/archived_builds/release_v2.0.0/build_25JAN2021/knowledge_graphs/subclass_builds/relations_only/owlnets/PheKnowLator_v2.0.0_full_subclass_relationsOnly_noOWL_SUBCLASS_purified_Triples_Identifiers.txt"
    },
... }

Update Two: KG Build Types

By varying the different combinations of the Construction Approach, Relation Strategy, and Property Graph Abstraction you end up with the following 12 KGs:

  1. subclass + relations only + owl
  2. subclass + relations only + owlnets
  3. subclass + relations only + owlnets - purified
  4. subclass + inverse relations + owl
  5. subclass + inverse relations + owlnets
  6. subclass + inverse relations + owlnets - purified
  7. instance + relations only + owl
  8. instance + relations only + owlnets
  9. instance + relations only + owlnets - purified
  10. instance + inverse relations + owl
  11. instance + inverse relations + owlnets
  12. instance + inverse relations + owlnets - purified
callahantiff commented 3 years ago

Awesome @LucaCappelletti94 thanks for your help with this. I will close this issue, but please feel free to re-open if we need to do more work here.