apache / incubator-hugegraph-doc

HugeGraph Website and Doc
https://hugegraph.apache.org/
Apache License 2.0
65 stars 97 forks source link

doc: add hugegraph-ai doc #331

Closed simon824 closed 7 months ago

simon824 commented 7 months ago

Link https://github.com/apache/incubator-hugegraph-ai/pull/30

simon824 commented 7 months ago

when exec rag, seems download nltk stopwords fails:

image

image

Did you try again several times? It might be a network issue.

liuxiaocs7 commented 7 months ago

Did you try again several times? It might be a network issue.

Yes, I have executed it manually many times and the result is like this, log:

  File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('stopwords')

  For more information see: https://www.nltk.org/data.html

  Attempted to load corpora/stopwords

  Searched in:
    - '/home/root1/nltk_data'
    - '/home/root1/software/miniconda3/envs/hg-ai/nltk_data'
    - '/home/root1/software/miniconda3/envs/hg-ai/share/nltk_data'
    - '/home/root1/software/miniconda3/envs/hg-ai/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/tmp/hugegraph_llm'
**********************************************************************

Do we need to add instructions for offline downloading in the document?

simon824 commented 7 months ago

Did you try again several times? It might be a network issue.

Yes, I have executed it manually many times and the result is like this, log:

  File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('stopwords')

  For more information see: https://www.nltk.org/data.html

  Attempted to load corpora/stopwords

  Searched in:
    - '/home/root1/nltk_data'
    - '/home/root1/software/miniconda3/envs/hg-ai/nltk_data'
    - '/home/root1/software/miniconda3/envs/hg-ai/share/nltk_data'
    - '/home/root1/software/miniconda3/envs/hg-ai/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/tmp/hugegraph_llm'
**********************************************************************

Do we need to add instructions for offline downloading in the document?

This exception was thrown from which line of code? I executed LTKHelper().stopwords() and the stopwords.zip file was downloaded successfully. It seems that LookupError will be caught in NLTKHelper.

            try:
                nltk.data.find("corpora/stopwords")
            except LookupError:
                nltk.download("stopwords", download_dir=nltk_data_dir)
vichayturen commented 7 months ago

Did you try again several times? It might be a network issue.

Yes, I have executed it manually many times and the result is like this, log:

  File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('stopwords')

  For more information see: https://www.nltk.org/data.html

  Attempted to load corpora/stopwords

  Searched in:
    - '/home/root1/nltk_data'
    - '/home/root1/software/miniconda3/envs/hg-ai/nltk_data'
    - '/home/root1/software/miniconda3/envs/hg-ai/share/nltk_data'
    - '/home/root1/software/miniconda3/envs/hg-ai/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/tmp/hugegraph_llm'
**********************************************************************

Do we need to add instructions for offline downloading in the document?

solutions:

  1. offline download and save into nltk_data dir
  2. set proxy
    import os
    os.environ["http_proxy"] = f"http://127.0.0.1:{port}"
    os.environ["https_proxy"] = f"http://127.0.0.1:{port}"
    import nltk
    nltk.download('stopwords')
liuxiaocs7 commented 7 months ago

after clean data and then re-init graph got the result.

curl -v -X DELETE 192.168.110.215:8080/graphs/hugegraph/clear?confirm_message=I%27m+sure+to+delete+all+data

image


Using the proxy in the command line can download normally, but encountered new problems when executing RAG.

  1. First init HG

image

  1. RAG

download stopwords success, under /tmp/hugegraph_llm/corpora

stopwords  stopwords.zip

image

INFO:     192.168.110.32:11406 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     192.168.110.32:11406 - "GET /queue/data?session_hash=804xxsr8w7f HTTP/1.1" 200 OK
KEYWORDS: ['Al Pacino', 'tell me about', 'Hollywood', 'film', 'Al', 'Pacino', 'tell', 'actor']
Traceback (most recent call last):
  File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/gradio/route_utils.py", line 235, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/gradio/blocks.py", line 1627, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/gradio/blocks.py", line 1173, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/gradio/utils.py", line 690, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/home/root1/lx/incubator-hugegraph-ai/./hugegraph-llm/src/hugegraph_llm/utils/gradio_demo.py", line 83, in graph_rag
    .run(verbose=True)
     ^^^^^^^^^^^^^^^^^
  File "/home/root1/lx/incubator-hugegraph-ai/hugegraph-llm/src/hugegraph_llm/operators/graph_rag_task.py", line 90, in run
    context = op.run(context)
              ^^^^^^^^^^^^^^^
  File "/home/root1/lx/incubator-hugegraph-ai/hugegraph-llm/src/hugegraph_llm/operators/hugegraph_op/graph_rag_query.py", line 130, in run
    raise RuntimeError("Unsupported ID format for Graph RAG.")
RuntimeError: Unsupported ID format for Graph RAG.

seems ID-FORMAT is INT:STRING here

need to clean data in test-graph?

liuxiaocs7 commented 7 months ago

Hi, @imbajin, any other suggestions?

simon824 commented 7 months ago

after clean data and then re-init graph got the result.

curl -v -X DELETE 192.168.110.215:8080/graphs/hugegraph/clear?confirm_message=I%27m+sure+to+delete+all+data

Added clear_graph_all_data() in https://github.com/apache/incubator-hugegraph-ai/pull/30/commits/c5a5a47bbbf8e10fcc04b4404d05fbfbd12e5227