Update-Jul-7-2024: I am working on updating this, expect updates within the next few weeks. I want to integrate marker as an additional OCR and integrate some of the feedback form users, stay tuned!
Update-Feb-19-2024: *scriptALTHeadless.py is in beta, script.py will work, if you do not want to use the command window to open chrome and do not need to log into any accounts or have a need to watch the AI use the web, you can use the scriptALTHeadless.py file. Replace the script.py file with the scriptALTHeadless.py file (rename scriptALTHeadless.py to script.py); this is the less ideal recommendation but many just want something that works without needing to do anything extra.
Note you do not need to have your browser open side by side with the textgen ui (as the videos below show), you can minimize the search window if you like.
(I will be adding full cpu mode soon)
https://github.com/facebookresearch/nougat
A web search extension for Oobabooga's text-generation-webui https://github.com/oobabooga/text-generation-webui
Example with OCR Everything On
This only works with Google Chrome working in debugging mode
Step 1: Close all instances of Chrome
Step 2: enter this into a seperate Windows command prompt (change the chrome.exe location if it is installed somewhere different than the example) --incognito can be added at the end to open in that mode:
"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
Enable Google Search: Toggle this to activate or deactivate the web search feature.
Web Search Fetch Length (characters): Specify the maximum length of content to fetch (in characters). This helps in limiting the response for large web pages.
The workflow follows this sequence of trigger statments that must start the user input
search
additional links
please expand
go to
Search: Type 'search' followed by your query to search the web.
Example: search recent submarine implosion
Example: search current time in Austin Texas ** do not reply with anything extra, simply the date and time please
Everything after **
will not be entered into the search but will be contextualized when the AI analyzes the web page.
Additional Links: Type 'additional links' to get more related links.
Example: additional links provide me with one wikipedia link that has more information
links the AI collects can be seen in the website_links.txt file in the web_search extension folder.
Example: additional links provide me with one link that has more information
Be careful, the AI will visit each link it lists and gather data from each when you invoke the please expand input.
Please Expand: Use this command to expand on the fetched content.
Example: please expand search the page for more information and then generate a 5 paragraph report for me
Go To: Use this command to go to a specific website.
Example: go to https://forecast.weather.gov/MapClick.php?lat=38.579440000000034&lon=-121.49084999999997 and give me a 5 day forcast using only cute emojis in a table format
If doing multiple urls put spaces (not commas) between each url.
If you "go to" a web page the is a pdf or your AI picks a web page that is a pdf, this will automatically use OCR to scan the downloaded pdf.
Example go to https://physics.uwo.ca/~mhoude2/courses/PDF%20files/physics1501/Ch5-Gravitation.pdf and pick out a cool equation to explain to me
You can select between 2 OCR models, a small and large model. The small model will use about ~13GB with the large model using about 2.5GB+ more. You can run the OCR models in a CPU mode, but 5GB of vram will still be used. The OCR model nouget (https://github.com/facebookresearch/nougat) does not have a means of forcing the model to use CPU mode, however if you replace the device.py file with the one from my repo you can make it happen (https://github.com/RandomInternetPreson/LucidWebSearch/blob/main/README.md#insructions-for-getting-cpu-mode-low-vram-to-work).
Only applicable if you are running in CPU mode, this can speed up inferencing, enable it if CPU mode is too slow for you.
Enabling this will execute OCR on both web pages and .pdf web pages. Very useful if doing research where scientific and mathematical symbols are displayed on a web page.
Example go to https://en.wikipedia.org/wiki/Quantum_mechanics and write out the equation for the time evolution of a quantum state described by the Schrödinger equation that you read in the page, and explain each variable. When you are writing out the equations, can you give me markdown outputs and make sure there is only a single dollar sign at the beginning and end of the equation so I can render it with markdown?
It is recommended that you run your textgen-webui interface in a browser other than Chrome. In addition have both browsers open and in view so you can monitor the sites the AI is visiting.
Use with Superbooga(v1 or v2) to help retain multiple search sessions if doing a lot of researching.
Please avoid searching for sensitive or personal information.
Your Google Chrome browser will be used to executes searches, be mindful of the sites you give the AI access to under your various logged in websites. Monitor the websites your AI is visiting through the chat and the open instance of Google Chrome.
Some websites may not be accessible or properly formatted for data extraction. The maximum character limit for website digestion is set by the user, if information you need is at the very end of a long web page try different sites or changing your search criteria.
If you encounter issues, ensure you closed all previous Chrome instances before loading it in debugging mode.
For feedback or support, please raise an issue on [GitHub Repositoryhttps://github.com/RandomInternetPreson/LucidWebSearch].
Step 1: Go here in your textgen install folder: text-generation-webui-main\installer_files\env\Lib\site-packages\nougat\utils
Step 2: Make a backup of device.py
Step 3: Replace device.py with https://github.com/RandomInternetPreson/LucidWebSearch/blob/main/device_LowVramCPUMODE.py, rename device_LowVramCPUMODE.py to device.py
When you enter a phrase beginning with search
the text after the word search will be entered into the Google Chrome browser as a google search.
The resulting web page is printed as temp_webpage.pdf
This printed web page is scanned and the text and links are separated out into website_links.txt and website_text.txt files
The website_text.txt is sent to the AI with your request and incorporated into its response
If the user is satisfied with the response no additional input is needed, however if the user wants to dive deeper into a subject they can ask the AI to retrieve a number of urls from the initial Google search.
When you enter a phrase beginning with additional links
along with some context like the number of links, the AI is fed the website_links.txt file and then chooses which link to reply back with based off the user criteria.
The link(s) are parsed out of the AI's response and each link in the response is visited and digested when you enter a phrase beginning with please expand
The AI will go to each site and contextualized the users requests when providing an output.
At any point the user can invoke the go to
request, the user follows the request with a url and some context, then the LLM goes to the site(s) to review the contents and responds while trying to accomidate the user's request.
*OCR Update
The LLM does an initial google search, collects the links, then can pick from those links or let the user guide them on which and how many links to choose. The set of interactions are represented like this:
"search" > "additional links" > "please expand"
Upon please expand each web page will either be:
Printed to a pdf and the contents of the pdf collected into a text file that is eventually sent back to the LLM, in this case hyperlinks and text from the pdf are collected and stored in two different txt files.
If the hyperlink is a .pdf file (ends in .pdf) it will undergo 1 of 2 processes:
2a. sent to the same python code that scans the printed .pdf file, and parses text and hyperlinks the same way
2b. sent to the OCR model where a .mmd file is generated, and those data are sent to the LLM (no hyperlinks are parsed)
Printed and sent to the OCR model (hyperlinks are parsed). This is useful if you need to scan web pages with mathematic and scientific symbols.
The same proccesses are executed on the "go to" command.
I don't know much python and I started my project by editing code here: https://github.com/simbake/web_search specifically code for the input_modifier and ui checkbox.
This is a video of how to install a different extension, but it is the same set of steps, just don't do "pip install TTS --no-dependencies"