Closed simon824 closed 8 months ago
when exec rag, seems download nltk stopwords fails:
Did you try again several times? It might be a network issue.
Did you try again several times? It might be a network issue.
Yes, I have executed it manually many times and the result is like this, log:
File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/nltk/data.py", line 583, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('stopwords')
For more information see: https://www.nltk.org/data.html
Attempted to load corpora/stopwords
Searched in:
- '/home/root1/nltk_data'
- '/home/root1/software/miniconda3/envs/hg-ai/nltk_data'
- '/home/root1/software/miniconda3/envs/hg-ai/share/nltk_data'
- '/home/root1/software/miniconda3/envs/hg-ai/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/tmp/hugegraph_llm'
**********************************************************************
Do we need to add instructions for offline downloading in the document?
Did you try again several times? It might be a network issue.
Yes, I have executed it manually many times and the result is like this, log:
File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/nltk/data.py", line 583, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource stopwords not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('stopwords') For more information see: https://www.nltk.org/data.html Attempted to load corpora/stopwords Searched in: - '/home/root1/nltk_data' - '/home/root1/software/miniconda3/envs/hg-ai/nltk_data' - '/home/root1/software/miniconda3/envs/hg-ai/share/nltk_data' - '/home/root1/software/miniconda3/envs/hg-ai/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '/tmp/hugegraph_llm' **********************************************************************
Do we need to add instructions for offline downloading in the document?
This exception was thrown from which line of code? I executed LTKHelper().stopwords()
and the stopwords.zip
file was downloaded successfully.
It seems that LookupError
will be caught in NLTKHelper
.
try:
nltk.data.find("corpora/stopwords")
except LookupError:
nltk.download("stopwords", download_dir=nltk_data_dir)
Did you try again several times? It might be a network issue.
Yes, I have executed it manually many times and the result is like this, log:
File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/nltk/data.py", line 583, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource stopwords not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('stopwords') For more information see: https://www.nltk.org/data.html Attempted to load corpora/stopwords Searched in: - '/home/root1/nltk_data' - '/home/root1/software/miniconda3/envs/hg-ai/nltk_data' - '/home/root1/software/miniconda3/envs/hg-ai/share/nltk_data' - '/home/root1/software/miniconda3/envs/hg-ai/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '/tmp/hugegraph_llm' **********************************************************************
Do we need to add instructions for offline downloading in the document?
solutions:
nltk_data
dirimport os
os.environ["http_proxy"] = f"http://127.0.0.1:{port}"
os.environ["https_proxy"] = f"http://127.0.0.1:{port}"
import nltk
nltk.download('stopwords')
after clean data and then re-init graph got the result.
curl -v -X DELETE 192.168.110.215:8080/graphs/hugegraph/clear?confirm_message=I%27m+sure+to+delete+all+data
Using the proxy in the command line can download normally, but encountered new problems when executing RAG.
download stopwords success, under /tmp/hugegraph_llm/corpora
stopwords stopwords.zip
INFO: 192.168.110.32:11406 - "POST /queue/join HTTP/1.1" 200 OK
INFO: 192.168.110.32:11406 - "GET /queue/data?session_hash=804xxsr8w7f HTTP/1.1" 200 OK
KEYWORDS: ['Al Pacino', 'tell me about', 'Hollywood', 'film', 'Al', 'Pacino', 'tell', 'actor']
Traceback (most recent call last):
File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/gradio/route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/gradio/blocks.py", line 1627, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/gradio/blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/root1/software/miniconda3/envs/hg-ai/lib/python3.11/site-packages/gradio/utils.py", line 690, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/root1/lx/incubator-hugegraph-ai/./hugegraph-llm/src/hugegraph_llm/utils/gradio_demo.py", line 83, in graph_rag
.run(verbose=True)
^^^^^^^^^^^^^^^^^
File "/home/root1/lx/incubator-hugegraph-ai/hugegraph-llm/src/hugegraph_llm/operators/graph_rag_task.py", line 90, in run
context = op.run(context)
^^^^^^^^^^^^^^^
File "/home/root1/lx/incubator-hugegraph-ai/hugegraph-llm/src/hugegraph_llm/operators/hugegraph_op/graph_rag_query.py", line 130, in run
raise RuntimeError("Unsupported ID format for Graph RAG.")
RuntimeError: Unsupported ID format for Graph RAG.
seems ID-FORMAT is INT:STRING
here
need to clean data in test-graph?
Hi, @imbajin, any other suggestions?
after clean data and then re-init graph got the result.
curl -v -X DELETE 192.168.110.215:8080/graphs/hugegraph/clear?confirm_message=I%27m+sure+to+delete+all+data
Added clear_graph_all_data()
in https://github.com/apache/incubator-hugegraph-ai/pull/30/commits/c5a5a47bbbf8e10fcc04b4404d05fbfbd12e5227
Link https://github.com/apache/incubator-hugegraph-ai/pull/30