earthcube / scheduler

Scheduling approaches related to gleaner tooling
Apache License 2.0
0 stars 0 forks source link

Assay Reports like p418 #73

Open valentinedwv opened 10 months ago

valentinedwv commented 10 months ago

Think running some of these, perhaps as a montly task would be good. https://github.com/earthcubearchitecture-project418/assay-data/blob/master/README.md

Put an issue for utilities

fils commented 10 months ago

@valentinedwv agreed!! I can share the following related from OIH

We tried to bring this together in a dashboard at: http://dashboard.oceaninfohub.org/ Jeff in the OIH project put this together, so I don't know much about it.

However, the performance is not great (pathetic to be honest). So now I preprocess all the SPARQL queries ahead of time into a parquet file for him and he is transitioning to using duckdb to access that file which is almost instantaneous.

I use https://github.com/iodepo/odis-arch/blob/master/graphOps/releaseGraphs/OIH_GraphPreProc.ipynb

to process the release graphs for OIH into Parquet and RDF (NQ) files.

Info about that is at https://github.com/iodepo/odis-arch/tree/master/graphOps/releaseGraphs

I use: https://github.com/iodepo/odis-arch/blob/master/graphOps/releaseGraphs/extraction_OIHDashboard.ipynb to test the query into those with DuckDB.

This is all a major work in progress still, so all test code. Also want to get a script to publish these files to Zenodo and get a DOI for the release graphs at major milestones.

Would love to discuss and plan a set of products for DeCODER similar in approach to what I have started there for OIH. I've not resolved a clear path for them yet either.

valentinedwv commented 10 months ago

digging... Dashboard here.

Looks like it's dynamic, and runs the queries Present plan is to just run reports weekly, or when a data load is complete. So this backs that idea up. But just rework the UI to use non-dynamic sources.

Writing results to parquet, ok...

Looking to get queries into the utilities, so that are in one place, not two or three.

One thing that needs to be solved... where do we put all the queries... Thinking we pull them from some GitHub repo raw url so that all projects can share. Keep moving towards a shared architecture...

Repo would allow some organizing them into directories by task and project, and allow for experimental and borked directories.

valentinedwv commented 10 months ago

https://quarto.org

https://shiny.posit.co/py/api/