aorwall / moatless-tools

MIT License
265 stars 25 forks source link

Getting `0/0` for "Generating Embeddings" step #33

Closed john-b-yang closed 2 months ago

john-b-yang commented 2 months ago

Thanks for all the really inspiring work on SWE-bench + programming agents 😄

I had a quick question. I'm trying to run the 00_index_and_run.ipynb notebook. I'm attempting to run flask through the repository.

I've done the following steps:

  1. Cloned pallets/flask locally
  2. Set my OPENAI_API_KEY in a .env file located within notebooks/
  3. Run the below code from the notebook:
    
    import tree_sitter_python as tspython
    from tree_sitter import Language, Parser
    from moatless.index import CodeIndex, IndexSettings
    from moatless import FileRepository, Workspace

An OPENAI_API_KEY is required to use the OpenAI Models

model = "gpt-4o-2024-05-13" index_settings = IndexSettings( embed_model="text-embedding-3-small" )

repo_dir = "/absolute/path/to/flask" file_repo = FileRepository(repo_path=repo_dir)

code_index = CodeIndex(file_repo=file_repo, settings=index_settings) nodes, tokens = code_index.run_ingestion() print(f"Indexed {nodes} nodes and {tokens} tokens")



When I run these steps, I'm getting:
![Screenshot 2024-09-12 at 7 39 48 PM](https://github.com/user-attachments/assets/3751a3f0-e2e8-4105-a94f-523834e3b380)

It looks like no embeddings were generated, and I'm not quite sure where I went wrong here.

Thanks in advance!
aorwall commented 2 months ago

Weird. I tried to reproduce but could generate embeddings. Can you enable logging with logging.basicConfig(level=logging.INFO) and provide the logs?

image

john-b-yang commented 2 months ago

Oh cool ok that produced some warnings:

INFO:moatless.index.code_index:Initiated CodeIndex None with:
 * 0 classes
 * 0 functions
 * 0 vectors

INFO:moatless.index.code_index:Read 82 documents

WARNING:llama_index.core.node_parser.node_utils:Failed to use epic splitter to split docs/conf.py. Fallback to treesitter_split(). Error: too many values to unpack (expected 2)
WARNING:llama_index.core.node_parser.node_utils:Failed to use epic splitter to split examples/celery/make_celery.py. Fallback to treesitter_split(). Error: too many values to unpack (expected 2)

(And then many mor repetitions of this error message)

Perhaps I didn't install something correctly?

john-b-yang commented 2 months ago

Ah ok so the error is coming from here. Will play around with it a bit more.

Update: I realized the main problem is just that I was developing on mac haha, I switched to a linux machine and it's all good!

I think this line was throwing the error. captures is a dictionary, not a list of tuples(?). I tried changing it to captures.items() but was still unable to produce the result and I didn't look further.

It might've been because I was using tree-sitter-python==0.23.2 (0.21.0 which is required by this repo is not supported for arm, discussed here)

aorwall commented 1 month ago

Aha, I got the same error when I tried to upgrade tree-sitter.