Closed caufieldjh closed 2 years ago
The PR #17 implemented setting a maximum number of graphs to parse/validate in one run, and that appears to work as expected locally. Jenkins builds still seem to hang:
0:39:23 + python make_kg_manifest.py --bucket kg-hub-public-data --outpath MANIFEST.yaml --maximum 10
10:39:25 Retrieving OBO metadata from https://raw.githubusercontent.com/OBOFoundry/OBOFoundry.github.io/master/registry/ontologies.yml...
10:39:28 Found credentials in environment variables.
10:39:28 Searching kg-hub-public-data...
10:40:25 Bucket kg-hub-public-data contains 137386 objects.
10:40:25 Found 528 new compressed graph files.
10:40:25 Found 2046 new uncompressed graph files.
10:40:25 Will consider only 10 files in total.
10:40:25 Will process 10 new compressed graph files.
10:40:25 Will process 0 new uncompressed graph files.
10:40:25 No updates for kg-idg.
10:40:25 Validating new builds for kg-covid-19...
10:40:25 Retrieving kg-covid-19/20200925/kg-covid-19.tar.gz...
10:40:46 Validating graph files with KGX...
10:40:47 biocontext map idot_context has illegal prefix: 2D-PAGE.PROTEIN
10:40:47 biocontext map idot_context has illegal prefix: 3DMET
10:40:47 biocontext map idot_context has illegal prefix: MMMP:BIOMAPS
10:40:49 class "organism taxon" slot "has taxonomic rank" does not reference an existing slot. New slot was created.
10:40:53 biocontext map idot_context has illegal prefix: 2D-PAGE.PROTEIN
10:40:53 biocontext map idot_context has illegal prefix: 3DMET
10:40:53 biocontext map idot_context has illegal prefix: MMMP:BIOMAPS
10:40:53 Loading schema https://w3id.org/linkml/types from https://raw.githubusercontent.com/biolink/biolink-model/2.2.13/biolink-model.yaml
...
[an indeterminate but excessive amount of time passes, during which nothing happens]
Maybe a biolink-model
update would help?
Reproduced the more recent issue locally - will make new issue because it is new behavior and I think it's unrelated to the other issue.
Would really like MANIFEST builds to complete, so:
kgx validate
on all new builds - Build is still hanging with the new settings (this time, Jenkins stopped it due to a lack of activity), so try with a single file at a time - it's not much but I want to see if the process will complete.
One file works! It still took 5 hours, but it finished, so that's nice.
@caufieldjh is this right - one graph takes 5 hours to process?
Yes - most of that time is dedicated to kgx validate
, and in turn that's much slower than it could be when there are numerous validation errors and a lot of log output (#13) - haven't found a solution for the logs yet, as the call to validate seems to ignore having the log output set to dev/null
and any attempt to redirect STDOUT/STDERR
Seems to be resolved now, as long as it doesn't have to do a full kgx validate
on KG-COVID-10 or Eco-KG.
The most recent build on Jenkins seemed to be making progress, albeit slowly, then hung at this point after ~48 hrs:
I aborted the build after > 5 days in this state.
Manifest generation is expected to take a long time initially, as it's going to index and validate every graph on KG-Hub, but this time it appeared to get stuck. Options: