frink-okn / FRINKIssues

0 stars 1 forks source link

Implement RDF->HDT Strategy #2

Open cbizon opened 4 months ago

cbizon commented 4 months ago

Being worked on by biobricks (Zaki)

cbizon commented 4 months ago

Assigned to @YaphetKG just to keep an eye on, not to do anything.

YaphetKG commented 1 month ago

Untitled Diagram drawio

From theme 1's perspective, i think there would be two interaction points,

  1. After commits to develop they would merge develop into main. That starts up the conversion pipeline. The conversion pipeline would just convert to hdt , Generate Report
  2. After the report and hdt is uploaded notify Theme 1 that their graph is ready with a report for their review. Part of this message could be a. Which branch and what file they need to look at. b. instructions pointing to lakefs frink page to create tag (?)

    From our side:

    1. the conversion pipeline seems ok for smaller graph , and some work is left in creating stats scripts.
    2. Need to think about how to handle bigger graphs. (compute power needed , are we going to be able to run them in sterling? does that ever scale to other cloud? )
    3. Deployment Using a helm capable pod seems like a straight forward approach. Adding a download capability for current
cbizon commented 1 month ago

Thanks @YaphetKG

So at point 2 they review - are they reviewing just static docs or is there a way for them to query the graph?

Then the tag creation triggers the deploy pipeline?

YaphetKG commented 1 month ago

i am imagining it would just be static files, we wouldn't have actual servers at that point. Main thing i was think there is for some of them we need a considerible amount of resources, and maybe having a server spin up and wait for evaluation may be too much?

YaphetKG commented 1 month ago

Yep, the tag creation would deploy the server and sends some info to spider aswell

YaphetKG commented 2 weeks ago

current status of graph conversions automated


RDF / ttl / nq repos:

    ✅ climatepub4-kg 
    ✅ dream-kg
    ⚠️ neighborhood-information-kg (?) -- nothing in main but there is a demo branch  -- SKIPPED from setting up action
    ✅ rural-kg
    ✅ saw-graph -- Errorring but ignoned 
        Error:  in combining some date data types in rdf were not properly assigned correct type. 
    ⚠️ scales-kg  -- need more hdd storage 
    ❌ secure-chain-kg - Erroring
            20:53:38 WARN  riot            :: [line: 3869307, col: 1 ] Bad IRI: <secure-chain://person/pre-commit-ci[bot]> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
            20:53:38 WARN  riot            :: [line: 3888997, col: 1 ] Bad IRI: <secure-chain://person/dependabot[bot]> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
            20:53:38 WARN  riot            :: [line: 3891408, col: 1 ] Bad IRI: <secure-chain://person/depfu[bot]> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
            20:53:38 WARN  riot            :: [line: 3895897, col: 1 ] Bad IRI: <secure-chain://person/dependabot[bot]> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
            20:53:38 WARN  riot            :: [line: 3918688, col: 1 ] Bad IRI: <secure-chain://person/allcontributors[bot]> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
            20:53:38 INFO  riot            :: File: /mnt/repo/vul_lib.ttl
    ⚠️ sem-open-alex-kg --    needs heavy processing
    ❌ soc-kg -- Erroring
            21:04:22 INFO  riot            :: File: /mnt/repo/sockg.rdf
            21:04:23 ERROR riot            :: [line: 1, col: 1 ] Content is not allowed in prolog.
            21:04:23 INFO  riot            :: File: /mnt/repo/sockg_ontology.ttl            
    ✅ sodukn-kg
    ⚠️ urban-flooding-open-knowledge-network -- 28.7 gb nq file 

❌ Neo4j repos (Not yet converted)
    biohealth
    spoke-kg
    wildlife-kg

❌ HDT repos (just need deployment)
    biobricks-ice-kg
    ubergraph
    wikidata