Leask / halbot

Just another `ChatGPT` / `Gemini` / `Mistral (by ollama)` Telegram bob, which is simple design, easy to use, extendable and fun.
https://leaskh.com/post/711636926789271552/halbot
MIT License
101 stars 16 forks source link

Parsing not working on local files #28

Closed ghost closed 1 year ago

ghost commented 1 year ago

Hey @Leask I'm really excited by the capabilities of your tool! However I can't get the parser work on local files (at least .pdf).

Here is the debugging log:

[BOT 2023-04-28T15:28:51.268Z] Command: {"cmd":"clear","args":""} [HAL] Prompt: Hello! [BOT 2023-04-28T15:29:01.698Z] Event: 744327888 {"update_id":744327888,"message":{"message_id":114,"from":{"id":219402848,"is_bot":false,"first_name":"voxit","username":"voxit","language_code":"en"},"chat":{"id":219402848,"first_name":"voxit","username":"voxit","type":"private"},"date":1682695741,"forward_from":{"id":219402848,"is_bot":false,"first_name":"voxit","username":"voxit","language_code":"en"},"forward_date":1682695692,"document":{"file_name":"1706.03762.pdf","mime_type":"application/pdf","thumbnail":{"file_id":"AAMCBAADGQEAA3JkS-Y9ylP83AQeRdXz_AABGKecQaMAAvcRAAL_BVlSKV0TcCYd_3cBAAdtAAMvBA","file_unique_id":"AQAD9xEAAv8FWVJy","file_size":14184,"width":247,"height":320},"thumb":{"file_id":"AAMCBAADGQEAA3JkS-Y9ylP83AQeRdXz_AABGKecQaMAAvcRAAL_BVlSKV0TcCYd_3cBAAdtAAMvBA","file_unique_id":"AQAD9xEAAv8FWVJy","file_size":14184,"width":247,"height":320},"file_id":"BQACAgQAAxkBAANyZEvmPcpT_NwEHkXV8_wAARinnEGjAAL3EQAC_wVZUildE3AmHf93LwQ","file_unique_id":"AgAD9xEAAv8FWVI","file_size":2201700},"caption":"Summarize this paper"}} [BOT 2023-04-28T15:29:01.773Z] INFO: No suitable response. [BOT 2023-04-28T15:41:57.029Z] Event: 744327889 {"update_id":744327889,"message":{"message_id":116,"from":{"id":219402848,"is_bot":false,"first_name":"voxit","username":"voxit","language_code":"en"},"chat":{"id":219402848,"first_name":"voxit","username":"voxit","type":"private"},"date":1682696516,"text":"/clear","entities":[{"offset":0,"length":6,"type":"bot_command"}]}} [BOT 2023-04-28T15:41:57.030Z] Command: {"cmd":"clear","args":""} [HAL] Prompt: Hello! [BOT 2023-04-28T15:42:52.235Z] Event: 744327890 {"update_id":744327890,"message":{"message_id":118,"from":{"id":219402848,"is_bot":false,"first_name":"voxit","username":"voxit","language_code":"en"},"chat":{"id":219402848,"first_name":"voxit","username":"voxit","type":"private"},"date":1682696572,"text":"Give a detailled summary of this paper: https://www.researchgate.net/profile/Kamran-Sadigli/publication/354872173_Research_Paper/links/61555937eabde032acb7df8b/Research-Paper.pdf","entities":[{"offset":40,"length":138,"type":"url"}]}} [HAL] Prompt: Give a detailled summary of this paper: https://www.researchgate.net/profile/Kamran-Sadigli/publication/354872173_Research_Paper/links/61555937eabde032acb7df8b/Research-Paper.pdf

I also attached a screenshot just in case.

Am I missing something? Also, what are the files size/char limit being handled by your parser?

Thanks by advance for any answer and a huge thanks for all your work! Screenshot_20230428-175414

Leask commented 1 year ago

Thanks, but you need a Google API token to enable the OCR support for pdf.

On Fri, Apr 28, 2023 at 12:03 PM voxitme @.***> wrote:

Hey @Leask https://github.com/Leask I'm really excited by the capabilities of your tool! However I can't get the parser work on local files (at least .pdf).

Here is the debugging log:

[BOT 2023-04-28T15:28:51.268Z] Command: {"cmd":"clear","args":""} [HAL] Prompt: Hello! [BOT 2023-04-28T15:29:01.698Z] Event: 744327888 {"update_id":744327888,"message":{"message_id":114,"from":{"id":219402848,"is_bot":false,"first_name":"voxit","username":"voxit","language_code":"en"},"chat":{"id":219402848,"first_name":"voxit","username":"voxit","type":"private"},"date":1682695741,"forward_from":{"id":219402848,"is_bot":false,"first_name":"voxit","username":"voxit","language_code":"en"},"forward_date":1682695692,"document":{"file_name":"1706.03762.pdf","mime_type":"application/pdf","thumbnail":{"file_id":"AAMCBAADGQEAA3JkS-Y9ylP83AQeRdXz_AABGKecQaMAAvcRAAL_BVlSKV0TcCYd_3cBAAdtAAMvBA","file_unique_id":"AQAD9xEAAv8FWVJy","file_size":14184,"width":247,"height":320},"thumb":{"file_id":"AAMCBAADGQEAA3JkS-Y9ylP83AQeRdXz_AABGKecQaMAAvcRAAL_BVlSKV0TcCYd_3cBAAdtAAMvBA","file_unique_id":"AQAD9xEAAv8FWVJy","file_size":14184,"width":247,"height":320},"file_id":"BQACAgQAAxkBAANyZEvmPcpT_NwEHkXV8_wAARinnEGjAAL3EQAC_wVZUildE3AmHf93LwQ","file_unique_id":"AgAD9xEAAv8FWVI","file_size":2201700},"caption":"Summarize this paper"}} [BOT 2023-04-28T15:29:01.773Z] INFO: No suitable response. [BOT 2023-04-28T15:41:57.029Z] Event: 744327889 {"update_id":744327889,"message":{"message_id":116,"from":{"id":219402848,"is_bot":false,"first_name":"voxit","username":"voxit","language_code":"en"},"chat":{"id":219402848,"first_name":"voxit","username":"voxit","type":"private"},"date":1682696516,"text":"/clear","entities":[{"offset":0,"length":6,"type":"bot_command"}]}} [BOT 2023-04-28T15:41:57.030Z] Command: {"cmd":"clear","args":""} [HAL] Prompt: Hello! [BOT 2023-04-28T15:42:52.235Z] Event: 744327890 {"update_id":744327890,"message":{"message_id":118,"from":{"id":219402848,"is_bot":false,"first_name":"voxit","username":"voxit","language_code":"en"},"chat":{"id":219402848,"first_name":"voxit","username":"voxit","type":"private"},"date":1682696572,"text":"Give a detailled summary of this paper: https://www.researchgate.net/profile/Kamran-Sadigli/publication/354872173_Research_Paper/links/61555937eabde032acb7df8b/Research-Paper.pdf","entities":[{"offset":40,"length":138,"type":"url"}]}} [HAL] Prompt: Give a detailled summary of this paper: https://www.researchgate.net/profile/Kamran-Sadigli/publication/354872173_Research_Paper/links/61555937eabde032acb7df8b/Research-Paper.pdf

I also attached a screenshot just in case.

Am I missing something? Also, what are the files size/char limit being handled by your parser?

Thanks by advance for any answer and a huge thanks for all your work! [image: Screenshot_20230428-175414] https://user-images.githubusercontent.com/36687040/235197406-b9b622b3-12d4-45d8-8872-a7655d175e67.png

— Reply to this email directly, view it on GitHub https://github.com/Leask/halbot/issues/28, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABY4PSKCKZH5UFRD5CKTC3XDPS43ANCNFSM6AAAAAAXPNQLUA . You are receiving this because you were mentioned.Message ID: @.***>

--

Sincerely,

Sixia "Leask" Huang https://leaskh.com

ghost commented 1 year ago

Oh thanks! I thought the OCR was only applied to image files. But that make sense to apply it to PDFs too ! Sorry for the confusion!

You can obviously close this one.

PS: just a last question, where can we find more info on how your text parser is handling files larger than the max input token size?-------- Original Message -------- On Apr 28, 2023, 18:43, Leask wrote:

Thanks, but you need a Google API token to enable the OCR support for pdf.

On Fri, Apr 28, 2023 at 12:03 PM voxitme @.***> wrote:

Hey @Leask https://github.com/Leask I'm really excited by the capabilities of your tool! However I can't get the parser work on local files (at least .pdf).

Here is the debugging log:

[BOT 2023-04-28T15:28:51.268Z] Command: {"cmd":"clear","args":""} [HAL] Prompt: Hello! [BOT 2023-04-28T15:29:01.698Z] Event: 744327888 {"update_id":744327888,"message":{"message_id":114,"from":{"id":219402848,"is_bot":false,"first_name":"voxit","username":"voxit","language_code":"en"},"chat":{"id":219402848,"first_name":"voxit","username":"voxit","type":"private"},"date":1682695741,"forward_from":{"id":219402848,"is_bot":false,"first_name":"voxit","username":"voxit","language_code":"en"},"forward_date":1682695692,"document":{"file_name":"1706.03762.pdf","mime_type":"application/pdf","thumbnail":{"file_id":"AAMCBAADGQEAA3JkS-Y9ylP83AQeRdXz_AABGKecQaMAAvcRAAL_BVlSKV0TcCYd_3cBAAdtAAMvBA","file_unique_id":"AQAD9xEAAv8FWVJy","file_size":14184,"width":247,"height":320},"thumb":{"file_id":"AAMCBAADGQEAA3JkS-Y9ylP83AQeRdXz_AABGKecQaMAAvcRAAL_BVlSKV0TcCYd_3cBAAdtAAMvBA","file_unique_id":"AQAD9xEAAv8FWVJy","file_size":14184,"width":247,"height":320},"file_id":"BQACAgQAAxkBAANyZEvmPcpT_NwEHkXV8_wAARinnEGjAAL3EQAC_wVZUildE3AmHf93LwQ","file_unique_id":"AgAD9xEAAv8FWVI","file_size":2201700},"caption":"Summarize this paper"}} [BOT 2023-04-28T15:29:01.773Z] INFO: No suitable response. [BOT 2023-04-28T15:41:57.029Z] Event: 744327889 {"update_id":744327889,"message":{"message_id":116,"from":{"id":219402848,"is_bot":false,"first_name":"voxit","username":"voxit","language_code":"en"},"chat":{"id":219402848,"first_name":"voxit","username":"voxit","type":"private"},"date":1682696516,"text":"/clear","entities":[{"offset":0,"length":6,"type":"bot_command"}]}} [BOT 2023-04-28T15:41:57.030Z] Command: {"cmd":"clear","args":""} [HAL] Prompt: Hello! [BOT 2023-04-28T15:42:52.235Z] Event: 744327890 {"update_id":744327890,"message":{"message_id":118,"from":{"id":219402848,"is_bot":false,"first_name":"voxit","username":"voxit","language_code":"en"},"chat":{"id":219402848,"first_name":"voxit","username":"voxit","type":"private"},"date":1682696572,"text":"Give a detailled summary of this paper: https://www.researchgate.net/profile/Kamran-Sadigli/publication/354872173_Research_Paper/links/61555937eabde032acb7df8b/Research-Paper.pdf","entities":[{"offset":40,"length":138,"type":"url"}]}} [HAL] Prompt: Give a detailled summary of this paper: https://www.researchgate.net/profile/Kamran-Sadigli/publication/354872173_Research_Paper/links/61555937eabde032acb7df8b/Research-Paper.pdf

I also attached a screenshot just in case.

Am I missing something? Also, what are the files size/char limit being handled by your parser?

Thanks by advance for any answer and a huge thanks for all your work! [image: Screenshot_20230428-175414] https://user-images.githubusercontent.com/36687040/235197406-b9b622b3-12d4-45d8-8872-a7655d175e67.png

— Reply to this email directly, view it on GitHub https://github.com/Leask/halbot/issues/28, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABY4PSKCKZH5UFRD5CKTC3XDPS43ANCNFSM6AAAAAAXPNQLUA . You are receiving this because you were mentioned.Message ID: @.***>

--

Sincerely,

Sixia "Leask" Huang https://leaskh.com

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Leask commented 1 year ago

OK, never mind. Feel free to open any issues again on this project. Enjoy! 😄

Leask commented 1 year ago

Sorry! I missed the last line of the message. In the chat mode, I will keep the system prompt and the last conversations that can fit the limit. I can only trim the string to match the limit if you submit a large document or an image containing lots of text. In the future, I will try a new way to distill information from the input.

ghost commented 1 year ago

Adding new functionnalities to your utilitas framework or trying langchain or faiss?

Leask commented 1 year ago

Still considering this, and the gpt-4 in coming, everything is changing fast. 🤣