Knowledge-Graph-Hub / knowledge-graph-hub.github.io

https://kghub.org
BSD 3-Clause "New" or "Revised" License
2 stars 2 forks source link

Manifest build takes too long, then hangs #16

Closed caufieldjh closed 2 years ago

caufieldjh commented 2 years ago

The most recent build on Jenkins seemed to be making progress, albeit slowly, then hung at this point after ~48 hrs:

00:19:25  The project kg-idg contains:
00:19:25    2133 objects
00:19:25    22 builds
00:19:25    1 incorrectly named builds
00:19:25        ['raw']
00:19:25    1 incorrectly structured builds
00:19:25        ['raw']
00:19:25    22 builds with KGX format problems
00:19:25        ['20211029', '20211101', '20211112', '20211123', '20211201', '20211202', '20211207', '20211210', '20211213', '20211215', '20211221', '20211223', '20220101', '20220106', '20220107', '20220119', '20220201', '20220203', '20220204', '20220216', '20220223', 'raw']
00:19:25  Validating new builds for kg-covid-19...
00:19:25  Retrieving kg-covid-19/20200925/kg-covid-19.tar.gz...
00:19:35  Validating graph files with KGX...

I aborted the build after > 5 days in this state.

Manifest generation is expected to take a long time initially, as it's going to index and validate every graph on KG-Hub, but this time it appeared to get stuck. Options:

caufieldjh commented 2 years ago

The PR #17 implemented setting a maximum number of graphs to parse/validate in one run, and that appears to work as expected locally. Jenkins builds still seem to hang:

0:39:23  + python make_kg_manifest.py --bucket kg-hub-public-data --outpath MANIFEST.yaml --maximum 10
10:39:25  Retrieving OBO metadata from https://raw.githubusercontent.com/OBOFoundry/OBOFoundry.github.io/master/registry/ontologies.yml...
10:39:28  Found credentials in environment variables.
10:39:28  Searching kg-hub-public-data...
10:40:25  Bucket kg-hub-public-data contains 137386 objects.
10:40:25  Found 528 new compressed graph files.
10:40:25  Found 2046 new uncompressed graph files.
10:40:25  Will consider only 10 files in total.
10:40:25  Will process 10 new compressed graph files.
10:40:25  Will process 0 new uncompressed graph files.
10:40:25  No updates for kg-idg.
10:40:25  Validating new builds for kg-covid-19...
10:40:25  Retrieving kg-covid-19/20200925/kg-covid-19.tar.gz...
10:40:46  Validating graph files with KGX...
10:40:47  biocontext map idot_context has illegal prefix: 2D-PAGE.PROTEIN
10:40:47  biocontext map idot_context has illegal prefix: 3DMET
10:40:47  biocontext map idot_context has illegal prefix: MMMP:BIOMAPS
10:40:49  class "organism taxon" slot "has taxonomic rank" does not reference an existing slot.  New slot was created.
10:40:53  biocontext map idot_context has illegal prefix: 2D-PAGE.PROTEIN
10:40:53  biocontext map idot_context has illegal prefix: 3DMET
10:40:53  biocontext map idot_context has illegal prefix: MMMP:BIOMAPS
10:40:53  Loading schema https://w3id.org/linkml/types from https://raw.githubusercontent.com/biolink/biolink-model/2.2.13/biolink-model.yaml
...
[an indeterminate but excessive amount of time passes, during which nothing happens]

Maybe a biolink-model update would help?

caufieldjh commented 2 years ago

Reproduced the more recent issue locally - will make new issue because it is new behavior and I think it's unrelated to the other issue.

caufieldjh commented 2 years ago

Would really like MANIFEST builds to complete, so:

caufieldjh commented 2 years ago

Build is still hanging with the new settings (this time, Jenkins stopped it due to a lack of activity), so try with a single file at a time - it's not much but I want to see if the process will complete.

caufieldjh commented 2 years ago

One file works! It still took 5 hours, but it finished, so that's nice.

justaddcoffee commented 2 years ago

@caufieldjh is this right - one graph takes 5 hours to process?

caufieldjh commented 2 years ago

Yes - most of that time is dedicated to kgx validate, and in turn that's much slower than it could be when there are numerous validation errors and a lot of log output (#13) - haven't found a solution for the logs yet, as the call to validate seems to ignore having the log output set to dev/null and any attempt to redirect STDOUT/STDERR

caufieldjh commented 2 years ago

Seems to be resolved now, as long as it doesn't have to do a full kgx validate on KG-COVID-10 or Eco-KG.