Snowflake-Labs / sfquickstarts

Follow along with our tutorials to get you up and running with the Snowflake Data Cloud.
Apache License 2.0
296 stars 562 forks source link

Build a Retrieval Augmented Generation(RAG) based LLM assistant using Streamlit, OpenAI and LlamaIndex #993

Open sfc-gh-dong opened 6 months ago

sfc-gh-dong commented 6 months ago

Describe the bug Llama Index unable to read files in hidden directory .content, so it failed with this error while running python build_index.py Building vector index... Traceback (most recent call last): File "/Users/dong/sfguide-blog-ai-assistant/build_index.py", line 43, in main() File "/Users/dong/sfguide-blog-ai-assistant/build_index.py", line 39, in main build_index(data_dir, knowledge_base_dir) File "/Users/dong/sfguide-blog-ai-assistant/build_index.py", line 23, in build_index documents = SimpleDirectoryReader(input_dir=d_dir, required_exts=required_exts, recursive=True).load_data() File "/Users/dong/sfguide-blog-ai-assistant/ai-assistant/lib/python3.9/site-packages/llama_index/readers/file/base.py", line 143, in init self.input_files = self._add_files(self.input_dir) File "/Users/dong/sfguide-blog-ai-assistant/ai-assistant/lib/python3.9/site-packages/llama_index/readers/file/base.py", line 204, in _add_files raise ValueError(f"No files found in {input_dir}.") ValueError: No files found in /Users/dong/sfguide-blog-ai-assistant/.content/blogs.

Simply copying the files in .content/blog to another (non hidden) directory works.

URL of where you see the bug https://quickstarts.snowflake.com/guide/build_rag_based_blog_ai_assistant_using_streamlit_openai_and_llamaindex/index.html?index=..%2F..index#3

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context Add any other context about the problem here.

RobertDupuy commented 5 months ago

Ignore my earlier comment, upon reading your post better, I realize you know it is an issue with the hidden directory.

documents = SimpleDirectoryReader(
        input_dir=data_dir,
        exclude_hidden=False,
        ).load_data()