added summarize - Githubissues

ylyangtw commented 1 year ago

summarize would fail if the source exists in the summary namespace. For testing, I clear the summary namespace first using this sqarql:

DELETE ?s ?p ?o  WHERE { ?s ?p ?o .FILTER regex(str(?s), "iris") .}

valentinedwv commented 1 year ago

(ignore. this get's rid of the entire namespace)

In manage blaze graph, there are delete and create namespace, was setup for integration testing, but think it should work.

https://earthcube.github.io/earthcube_utilities/earthcube_utilities/earthcube_utilities_code/#ec.graph.manageGraph.ManageBlazegraph

valentinedwv commented 1 year ago

If there is an issue with files loading to the graph, Then We should probably look to save the file to s3... then load to the graph.

We could better test the loading issues. aka, how can we test this, and catch this, and where should it be done (dagster/ec_utils)

Then maybe the saved becomes an actual asset, at some point

summarize would fail if the source exists in the summary namespace. For testing, I clear the summary namespace first using this sqarql:
DELETE ?s ?p ?o  WHERE { ?s ?p ?o .FILTER regex(str(?s), "iris") .}

valentinedwv commented 1 year ago

This issue as how to push a file in s3 to a graph.

https://github.com/earthcube/scheduler/issues/64

ylyangtw commented 1 year ago

If there is an issue with files loading to the graph, Then We should probably look to save the file to s3... then load to the graph.

We could better test the loading issues. aka, how can we test this, and catch this, and where should it be done (dagster/ec_utils)

Then maybe the saved becomes an actual asset, at some point
summarize would fail if the source exists in the summary namespace. For testing, I clear the summary namespace first using this sqarql:
DELETE ?s ?p ?o  WHERE { ?s ?p ?o .FILTER regex(str(?s), "iris") .}

Agree!

ylyangtw commented 1 year ago

if loading to graph failed, it will upload the result to s3.

valentinedwv commented 1 year ago

pull and check this. I reworked the code

ylyangtw commented 1 year ago

Thanks @valentinedwv!

tasks under deployment also works on my local.
summarize works

Seeing errors in missing_report_graph

File "/usr/src/app/project/eco/ops/implnet_ops_iris.py", line 676, in iris_missingreport_graph
returned_value = missingReport(source_url, bucket, source_name, s3Minio, graphendpoint, milled=milled, summon=summon)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ec/reporting/report.py", line 99, in missingReport
graph_urns = ec.graph.sparql_query.queryWithSparql("repo_select_graphs", graphendpoint, {"repo": repo})
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ec/graph/sparql_query.py", line 39, in queryWithSparql
q_df = sparqldataframe.query(endpoint, thsGraphQuery)
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sparqldataframe/__init__.py", line 24, in query
raise Exception('Invalid query')

Do you maybe know what this can be?

valentinedwv commented 1 year ago

Might need to update a dependency to: https://github.com/earthcube/scheduler/blob/51652704c1a7022c1b61c6a09fe8240c15380b65/dagster/implnets/requirements_code.txt#L18

earthcube-utilities @ git+https://github.com/earthcube/earthcube_utilities@b671efb#subdirectory=earthcube_utilities

Should have changed it in the standard requirements.txt, too.

valentinedwv commented 1 year ago

Try again. Got the triplestore swapped. fixed an upload issue.

ylyangtw commented 1 year ago

Nice! All work

earthcube / scheduler

added summarize #48