OpenGeoMetadata / GeoCombine

A Ruby toolkit for managing geospatial metadata
https://github.com/OpenGeoMetadata/GeoCombine
Other
20 stars 23 forks source link

Bug - rake geocombine:clone errs out #144

Closed ewlarson closed 1 year ago

ewlarson commented 1 year ago

I've been playing around with GeoCombine for Aardvark metadata harvesting.

Harvesting individual institutions has been working well:

bundle exec rake geocombine:clone\[edu.umn\]
=> 5480 docs

But cloning all the repos, fails:

First run

ewlarson@beanburrito GeoDiscovery % bundle exec rake geocombine:clone
rake aborted!
SystemStackError: stack level too deep
/Users/ewlarson/.rbenv/versions/3.2.1/bin/bundle:25:in `load'
/Users/ewlarson/.rbenv/versions/3.2.1/bin/bundle:25:in `<main>'
Tasks: TOP => geocombine:clone
(See full trace by running task with --trace)
ewlarson@beanburrito GeoDiscovery % cd tmp/opengeometadata 
ewlarson@beanburrito opengeometadata % ls -la
total 0
drwxr-xr-x  10 ewlarson  staff  320 Mar  9 08:19 .
drwxr-xr-x  15 ewlarson  staff  480 Mar  9 08:18 ..
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:19 edu.harvard
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:19 edu.nyu
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:19 edu.princeton.arks
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:18 edu.stanford.purl
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:19 edu.tufts
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:19 edu.umn
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:19 edu.virginia
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:18 shared-repository

Second run

ewlarson@beanburrito GeoDiscovery % bundle exec rake geocombine:clone
rake aborted!
SystemStackError: stack level too deep
/Users/ewlarson/.rbenv/versions/3.2.1/bin/bundle:25:in `load'
/Users/ewlarson/.rbenv/versions/3.2.1/bin/bundle:25:in `<main>'
Tasks: TOP => geocombine:clone
(See full trace by running task with --trace)
ewlarson@beanburrito GeoDiscovery % cd tmp/opengeometadata 
ewlarson@beanburrito opengeometadata % ls -la
total 0
drwxr-xr-x  10 ewlarson  staff  320 Mar  9 08:43 .
drwxr-xr-x  15 ewlarson  staff  480 Mar  9 08:41 ..
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:43 edu.harvard
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:43 edu.nyu
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:42 edu.princeton.arks
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:41 edu.stanford.purl
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:43 edu.tufts
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:43 edu.umn
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:43 edu.virginia
drwxr-xr-x   3 ewlarson  staff   96 Mar  9 08:41 shared-repository

Can anyone else confirm? Seems to likely StackError the same place each clone run...

thatbudakguy commented 1 year ago

Can you do a full trace? I wonder if this is being thrown from the ruby git client or something. I think I might've seen it once or twice when testing but I haven't managed to replicate it recently.

kaloyan13 commented 1 year ago

Saw the same issue, when trying to execute bundle exec rake geocombine:clone --trace trace.txt

Cloging only edu.nyu worked bundle exec rake geocombine:clone[edu.nyu]

Working on Ubuntu 22.04 with these versions:

$ ruby -v
ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux]
$ gem -v
3.4.9
$ bundle -v
Bundler version 2.4.9
thatbudakguy commented 1 year ago

Thanks for the trace — I think this might've been related to arguments I was passing to Git.clone. Hopefully https://github.com/OpenGeoMetadata/GeoCombine/pull/148 will resolve that.

It could also be related to the behavior that happens when we try to clone a Git repository with no contents (of which there are a few in OGM); https://github.com/OpenGeoMetadata/GeoCombine/pull/139 has better behavior to handle that.

Unfortunately PRs are failing because a new rubocop is angry about one of our tests being skipped without a reason; https://github.com/OpenGeoMetadata/GeoCombine/pull/147 fixes that issue.

thatbudakguy commented 1 year ago

https://github.com/OpenGeoMetadata/GeoCombine/pull/148/commits/3f4389e4d13283ea1a0144e3b2b5458566c64161 removes the recursion which I think might've been the cause of this.

ewlarson commented 1 year ago

This is fixed! I was able to clone all the repos...

ewlarson@beanburrito .internal_test_app % bundle exec rake geocombine:clone --trace
** Invoke geocombine:clone (first_time)
** Execute geocombine:clone
Cloned https://github.com/OpenGeoMetadata/shared-repository.git
Cloned https://github.com/OpenGeoMetadata/edu.stanford.purl.git
Cloned https://github.com/OpenGeoMetadata/edu.princeton.arks.git
Cloned https://github.com/OpenGeoMetadata/edu.virginia.git
Cloned https://github.com/OpenGeoMetadata/edu.nyu.git
Cloned https://github.com/OpenGeoMetadata/edu.harvard.git
Cloned https://github.com/OpenGeoMetadata/edu.umn.git
Cloned https://github.com/OpenGeoMetadata/edu.tufts.git
Cloned https://github.com/OpenGeoMetadata/edu.columbia.git
Cloned https://github.com/OpenGeoMetadata/edu.lclark.git
Cloned https://github.com/OpenGeoMetadata/gov.data.git
Cloned https://github.com/OpenGeoMetadata/geobtaa.git
Cloned https://github.com/OpenGeoMetadata/edu.uarizona.git
Cloned https://github.com/OpenGeoMetadata/edu.berkeley.git
Cloned https://github.com/OpenGeoMetadata/edu.cornell.git
Cloned https://github.com/OpenGeoMetadata/edu.vt.git
Cloned https://github.com/OpenGeoMetadata/edu.upenn.git
Cloned https://github.com/OpenGeoMetadata/edu.mit.git
Cloned https://github.com/OpenGeoMetadata/ca.frdr.geodisy.git
Cloned https://github.com/OpenGeoMetadata/edu.wisc.git
Cloned 20 repositories