[Enhance]add JobAPI doc for OLAP algorithms in HugeServer

JackyYangPassion commented 1 year ago

Problem Type (问题类型)

None

Your Question (问题描述)

To enable users to quickly apply integrated OLAP algorithms, it is recommended to add a job API to the document RESTful API module and provide instructions for usage.

AlgorithmAPI

Environment (环境信息) Server Version: 1.0.0 (Apache Release Version) Backend: MySQL 8.0.32 hugegraph-hubble hugegraph-loader

For example Run LPA Algorithm job

Load demo data into the hugegraph with hugegraph-loader

 bin/hugegraph-loader.sh -g hugegraph -f example/file/struct.json -s example/file/schema.groovy

Post http://localhost:8080/graphs/hugegraph/jobs/algorithm/lpa by PostMan
View the result of the algorithm execution in hugegraph-hubble Task status:

Lap algo summary

Graph node label detail:

JackyYangPassion commented 1 year ago

TODO: add AlgorithmAPI to DOC

github-actions[bot] commented 1 year ago

Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label

moonbaseDelta commented 4 months ago

Hi there,

I got through this to run some computation, but still cant figure out how the results are managed: on few algos they store in vertex property, on other its just calculations.

But take for instance degree_centrality - the results not saved into the graph, but still available through the API. It's useful when we work with smallest subsets around 10-1000 vertices\edges but on fairly large setup - like 10s of billions - I definitely want to tune that storing procedure.

imbajin commented 4 months ago

Hi there,

I got through this to run some computation, but still cant figure out how the results are managed: on few algos they store in vertex property, on other its just calculations.

But take for instance degree_centrality - the results not saved into the graph, but still available through the API. It's useful when we work with smallest subsets around 10-1000 vertices\edges but on fairly large setup - like 10s of billions - I definitely want to tune that storing procedure.

@moonbaseDelta Hi, we support writing back to graph databases or local file/hdfs. You're right, writing back is generally a common requirement (u could also talk to 🤖 dosu in the comment like ↓)

Also, @dosu could u help add more context & information for users to use graph-computer?

dosubot[bot] commented 4 months ago

To manage and store the results of graph computations in HugeGraph, especially for large datasets, you can configure the system to write back results to the graph database or a file using the following options:

Writing Results Back to HugeGraph

You can configure HugeGraph to write computation results back to the graph database. Here are the relevant configuration options:

Configuration Options:
- hugegraph.name: Set the graph name to load data and write results back.
- hugegraph.url: Set the HugeGraph URL to load data and write results back.
- output.result_write_type: Set the result write-type to output to HugeGraph. Allowed values are: OLAP_COMMON, OLAP_SECONDARY, OLAP_RANGE.

Example Configuration:

hugegraph.name: hugegraph
hugegraph.url: http://127.0.0.1:8080
output.result_write_type: OLAP_COMMON

Writing Results to a File (HDFS)

Alternatively, you can write the results to a file, such as HDFS. Here are the relevant configuration options:

Configuration Options:
- output.hdfs_path_prefix: Set the directory of HDFS output result.
- output.hdfs_url: Set the HDFS URL of output.
- output.hdfs_user: Set the HDFS user of output.
- output.hdfs_delimiter: Set the delimiter of HDFS output.
- output.hdfs_merge_partitions: Whether to merge output files of multiple partitions.
- output.hdfs_replication: Set the replication number of HDFS.

Example Configuration:

output.hdfs_path_prefix: /hugegraph-computer/results
output.hdfs_url: hdfs://127.0.0.1:9000
output.hdfs_user: hadoop
output.hdfs_delimiter: ,
output.hdfs_merge_partitions: true
output.hdfs_replication: 3

Best Practices

To ensure efficient handling of large datasets, consider the following best practices:

Batch Size and Threads:
- output.batch_size: Set the batch size of output.
- output.batch_threads: Set the number of threads used to batch output.
Retry Mechanism:
- output.retry_interval: Set the retry interval when output fails.
- output.retry_times: Set the retry times when output fails.
Thread Pool Management:
- output.single_threads: Set the number of threads used for single output.
- output.thread_pool_shutdown_timeout: Set the timeout seconds of output threads pool shutdown.
Including Properties:
- output.with_adjacent_edges: Whether to output the adjacent edges of the vertex.
- output.with_edge_properties: Whether to output the properties of the edge.
- output.with_vertex_properties: Whether to output the properties of the vertex.

By configuring these options, you can effectively manage and store the results of graph computations in HugeGraph, ensuring efficient handling of large datasets. Additionally, you can refer to the detailed configuration options provided in the HugeGraph documentation for more advanced settings and optimizations ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

apache / incubator-hugegraph-doc