Sebby37 / Dead-Internet

Y'all thought the dead internet theory wasn't real, but HERE IT IS
151 stars 17 forks source link

Guide generation. #6

Open lastrosade opened 5 months ago

lastrosade commented 5 months ago

When generating the URLs, generate a website description and use that description to guide the generation of the web page. Consider using GBNF for this.

Zetaphor commented 5 months ago

In case like me anyone else isn't familiar with that acronym: https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md

Kreevoz commented 5 months ago

I experimented with this by allowing for parameters to be part of all URLs, and then telling llama3 to append two of them. It works very well:

Put this into the prompt for page generation (as well as for search results, or make it a system prompt). (Of course you will have to alter the url parsing a little to keep + strip out the parameters for further use also!)

Generate a webpage from the fictional site of '{url}' at the resource path of '{path}' with parameters: '{params}'. Make sure all links generated either link to an external website, or if they link to another resource on the current website, they have the current url prepended ({url}) to them. Append the parameter '?&description=(short summary of the linked webpage here that describes the content or purpose)' to all generated URLs. Also append the parameter '&previous-webpage=(short summary of the current website that the link appears on)' as final parameter. These parameters help you to figure out what to generate, so you must generate them on each link. If there are other parameters needed, make sure to combine them. Here is an example of a finished link: '<a href=\"http://www.flower-website.com/?parameter1=cart&description=Shopping cart of the town's best flower shop website&previous-webpage=Merchant directory, flower shop subpage\">Link title here</a>' Update the previous-webpage parameter to match the currently generated webpage.

That way you also get pages generated that roughly match what the fake search engine spits out, and they are thematically grouped.

Probably only a band-aid and could be implemented in a more elegant way, but it is easy enough to do like this.

scalar27 commented 5 months ago

Great idea. I added it and it does help a lot for the coherence. However, I see even more of the problem where the generated links on the following pages do not have the 127.0.0.1:5000 prepended to it. Is there a way to fix this to ensure all links get that?

Kreevoz commented 5 months ago

@scalar27 🤔 I just serve the thing on localhost on port 80 and thereby no port is required and links just all work without any hassle. You can tell flask which port to use in the main.py , like so

if __name__ == "__main__":
    app.run(host='127.0.0.1', port=80, debug=False)
    print(engine.export_internet())

Alternatively I suppose you could append the port to all hrefs by modifying the _format_page function in the ReaperEngine.py file to include that.