Need to better trap gleaner errors. If it does not run, stop run.
Ideas:
done use GRPC code server (dagster api...) to hold code so that dagster daemon get's restarted less often
(no longer needed: logs uploaded every 600 seconds) use sensors to monitor long running jobs with names. Ouput asset with name and container id
tag outputs as assets, and have separate end load reports, that run late night if it's updated.
(done) have graph reports run over release file, and not graph.
separate summon from nabu dataloading
run sources once
create sensors that are community based that will add datasets to a community graph using relese files
Need to setup a sensor that when gleaner is run, it stores a container id, then a nabu sensor would wait until the run completes, then run the nabu steps.... Then if the nabu steps complete, the graph gets's loaded, etc.
Right now if scheduler is updated and a gleaner/nabu is running, scheduler will never know that it was supposed to be watching that container.
Need to better trap gleaner errors. If it does not run, stop run.Ideas:
use GRPC code server (dagster api...) to hold code so that dagster daemon get's restarted less oftenhave graph reports run over release file, and not graph.separate summon from nabu dataloading
Need to setup a sensor that when gleaner is run, it stores a container id, then a nabu sensor would wait until the run completes, then run the nabu steps.... Then if the nabu steps complete, the graph gets's loaded, etc.
Right now if scheduler is updated and a gleaner/nabu is running, scheduler will never know that it was supposed to be watching that container.