gleanerio / gleaner

Gleaner: JSON-LD and structured data on the web harvesting
https://gleaner.io
Apache License 2.0
16 stars 10 forks source link

read sources from a url #256

Open valentinedwv opened 3 months ago

valentinedwv commented 3 months ago

Scheduler will be able to read a list of source from a url (in s3 for update monitoring).

Right now I think gleaner/nabu needs to be passed a configuration file that has services and sources information. This means containers need to be loaded with a updated config file.

If gleaner could just read sources from a url, that one less piece to worry about

valentinedwv commented 3 months ago

229

valentinedwv commented 3 months ago

fails when --cfg is a url

gleaner batch --cfg https://oss.geocodes-aws-dev.earthcube.org/test/scheduler/configs/test/gleanerconfig.yaml --source r2r
cannot find config file. Did you 'glcon generate --cfgName XXX' 
{"file":"/Users/valentin/development/dev_earthcube/gleanerio/gleaner/internal/config/gleanerConfig.go:54","func":"github.com/gleanerio/gleaner/internal/config.ReadGleanerConfig","level":"fatal","msg":"cannot find config file. Did you 'glcon generate --cfgName XXX' ","time":"2024-06-14T09:37:16-07:00"}

internal/config/source.go ReadSourcesCSV has some url logic to get an ioreader for https://pkg.go.dev/github.com/dvln/viper#ReadConfig