drforse / not-llama-fs

MIT License
37 stars 3 forks source link

KeyError: 'files' #3

Open sullemanhossam opened 4 weeks ago

sullemanhossam commented 4 weeks ago

Hi there,

Great work on the initiative and improvements on this project. It's perfect for what I need it for. However, I'm encountering an error, which I think might be because I'm using macOS. Please get back to me with your thoughts on this issue.


Preparing ../../res/httpswwwbaincom/RETRIEVED/2022 Carbon Credit Disclosure.pdf
Detected mime type: application/pdf
Preparing ../../res/httpswwwbaincom/RETRIEVED/Bain Brief Why The 5g Pessimists Are Wrong.pdf
Detected mime type: application/pdf
Preparing ../../res/httpswwwbaincom/RETRIEVED/Modello 231 External Eng 2023.pdf
Detected mime type: application/pdf
Preparing ../../res/httpswwwbaincom/RETRIEVED/Bain Brief Finding The Sustainable Advantage In Chemicals.pdf
Detected mime type: application/pdf
Preparing ../../res/httpswwwbaincom/RETRIEVED/Relatorio De Igualdade.pdf
Detected mime type: application/pdf
Producing
[]
{}
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/dutch22/Documents/GitHub/Spider/Spider PDF /out/not-llama-fs/app/__main__.py", line 46, in <module>
    main()
  File "/Users/dutch22/Documents/GitHub/Spider/Spider PDF /out/not-llama-fs/app/__main__.py", line 40, in main
    demo(args.path, args.producer, args.text_model, args.image_model, args.apikey)
  File "/Users/dutch22/Documents/GitHub/Spider/Spider PDF /out/not-llama-fs/app/__init__.py", line 54, in demo
    tree = producer.produce()
           ^^^^^^^^^^^^^^^^^^
  File "/Users/dutch22/Documents/GitHub/Spider/Spider PDF /out/not-llama-fs/not_llama_fs/producers/ollama_producer.py", line 119, in produce
    for n, file in enumerate(llama_response_json["files"]):
                             ~~~~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'files'
dutch22@Hossams-Laptop not-llama-fs %
sullemanhossam commented 4 weeks ago

Ive tried using a web ai perhaps i had issues with my system llama3 did make my macbook sweat 😆 but no luck unfortunately.


Preparing ../../res/httpswwwbaincom/RETRIEVED/Modello 231 External Eng 2023.pdf
Detected mime type: application/pdf
Preparing ../../res/httpswwwbaincom/RETRIEVED/Bain Brief Finding The Sustainable Advantage In Chemicals.pdf
Detected mime type: application/pdf
Preparing ../../res/httpswwwbaincom/RETRIEVED/Relatorio De Igualdade.pdf
Detected mime type: application/pdf
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/dutch22/Documents/GitHub/Spider/Spider PDF /out/not-llama-fs/app/__main__.py", line 46, in <module>
    main()
  File "/Users/dutch22/Documents/GitHub/Spider/Spider PDF /out/not-llama-fs/app/__main__.py", line 40, in main
    demo(args.path, args.producer, args.text_model, args.image_model, args.apikey)
  File "/Users/dutch22/Documents/GitHub/Spider/Spider PDF /out/not-llama-fs/app/__init__.py", line 54, in demo
    tree = producer.produce()
           ^^^^^^^^^^^^^^^^^^
  File "/Users/dutch22/Documents/GitHub/Spider/Spider PDF /out/not-llama-fs/not_llama_fs/producers/groq_producer.py", line 116, in produce
    return TreeObject.from_json(groq_response_json)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dutch22/Documents/GitHub/Spider/Spider PDF /out/not-llama-fs/not_llama_fs/fs/tree.py", line 30, in from_json
    for file in data["files"]:
                ~~~~^^^^^^^^^
KeyError: 'files'
dutch22@Hossams-Laptop not-llama-fs %
drforse commented 4 weeks ago

It seems that you have only pdf files to sort, which are not currently supported. And I didn't add an error processing when there are no supported files in the directory. I will probably add pdf support pretty soon. But right now only text and image files are supported.

Also, just a reminder: I don't believe that even after adding support for more file types, this project in it's current state may actually create a sensible and convenient files structure. Maybe with ChatGPT, I tried it with chatgpt and was impressed with the results, but they are still not perfect. May be in a few months, if I don't throw this project away, it might become smth actually sensible, but not rn, rn I would only use it for fun.

ChatGPT: 1000024584.jpg

sullemanhossam commented 4 weeks ago

Thanks for your response. Understood, I do think this is invaluable, and I do see the limitations and benefits of the system. I'm wary that perhaps the program may not be able to effectively sort differing file topics, such as reports to accounts, in comparison categorize them as documents. I do think that maybe if I did add an interim step of converting documents to txt to see their verdict, and then applying them to the PDF counterpart, it can definitely be a quick workaround. I'll see to maybe reading your code and making those adjustments. Wish me luck, but I do wish you it too on the implementation of PDF. PS, I will also try my GPT key, providing that it doesn't cost me a fortune, but I'm sure it will be fine.

tsuiusi commented 4 weeks ago

i submitted a pr that sorts pdfs too, might be a bit janky but it works

sullemanhossam commented 4 weeks ago
  viva russia we should keep in touch ⚔️ 

   -- 
  Sent from Canary 

   On Tuesday, Jun 04, 2024 at 1:32 pm, tsuiusi ***@***.***> wrote: 

  i submitted a pr that sorts pdfs too, might be a bit janky but it works 
  —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***> 
  ***@***.***": ***@***.***": "EmailMessage","potentialAction": ***@***.***": "ViewAction","target": "https://github.com/drforse/not-llama-fs/issues/3#issuecomment-2147420742","url": "https://github.com/drforse/not-llama-fs/issues/3#issuecomment-2147420742","name": "View Issue"},"description": "View this Issue on GitHub","publisher": ***@***.***": "Organization","name": "GitHub","url": "https://github.com"}}]
sullemanhossam commented 4 weeks ago

i submitted a pr that sorts pdfs too, might be a bit janky but it works

/Users/dutch22/Documents/GitHub/Spider/Spider PDF /post/httpswwwbaincom/RETRIEVED/Bain Brief Understanding Removing Barriers To Black Entrepreneurship In Canada.pdf
/Users/dutch22/Documents/GitHub/Spider/Spider PDF /post/httpswwwbaincom/RETRIEVED/Bain Brief The Measurement Advantage.pdf
/Users/dutch22/Documents/GitHub/Spider/Spider PDF /post/httpswwwbaincom/RETRIEVED/2022 Carbon Credit Disclosure.pdf
/Users/dutch22/Documents/GitHub/Spider/Spider PDF /post/httpswwwbaincom/RETRIEVED/Bain Brief Why The 5g Pessimists Are Wrong.pdf
/Users/dutch22/Documents/GitHub/Spider/Spider PDF /post/httpswwwbaincom/RETRIEVED/Modello 231 External Eng 2023.pdf
/Users/dutch22/Documents/GitHub/Spider/Spider PDF /post/httpswwwbaincom/RETRIEVED/Bain Brief Finding The Sustainable Advantage In Chemicals.pdf
/Users/dutch22/Documents/GitHub/Spider/Spider PDF /post/httpswwwbaincom/RETRIEVED/Relatorio De Igualdade.pdf
Ignoring wrong pointing object 38 0 (offset 0)
Ignoring wrong pointing object 39 0 (offset 0)
Ignoring wrong pointing object 40 0 (offset 0)
Ignoring wrong pointing object 41 0 (offset 0)
Ignoring wrong pointing object 42 0 (offset 0)
Ignoring wrong pointing object 43 0 (offset 0)
Ignoring wrong pointing object 44 0 (offset 0)
Ignoring wrong pointing object 46 0 (offset 0)
Ignoring wrong pointing object 47 0 (offset 0)
Ignoring wrong pointing object 48 0 (offset 0)
Ignoring wrong pointing object 49 0 (offset 0)
Ignoring wrong pointing object 50 0 (offset 0)
Ignoring wrong pointing object 51 0 (offset 0)
Ignoring wrong pointing object 53 0 (offset 0)

Hi there, any idea what this may be.

tsuiusi commented 4 weeks ago

probably an error with reading the subdirectories (the PR code assumes everything is in one flat directory), i just found the bug, will update.