dcramer / gamegame

https://gamegame.ai
Apache License 2.0
8 stars 3 forks source link

Support other source data types #11

Open dcramer opened 1 month ago

dcramer commented 1 month ago

Currently we only support PDF. Obviously we could add markdown support pretty easily, but there's two other things that would be awesome:

1) Images - if nothing else to be able to embed them. Technically we could OCR them but that'll be hit and miss. 2) Spreadsheets - not really sure what to do here, but for example Terraforming Mars has no information on any of its cards, and the cards are actually key to the game. The community has created a spreadsheet that we could use as a model.

dcramer commented 1 month ago

One thought here that we can curate is having it be aware of BGG's API. I dont know that we wanna crawl the whole site and index it (seems hard), but we could at the very least tell it to index certain threads.

https://boardgamegeek.com/wiki/page/BGG_XML_API2 https://github.com/WanielDeiss/rx-bgg-api

dcramer commented 1 month ago

Open question: If we automatically indexed BGG, what would the strategy be?

Maybe cutoff on threads based on number of replies? Only pinned? (pinned seems risky)

https://boardgamegeek.com/boardgame/167791/terraforming-mars/forums/66?pageid=1&sort=hot

There's still an issue in some cases that the material we need ends up being an external doc. Also its possible the content is not that heavy to index, though we'd have to store it in Postgres + run tsvector+embeddings on the whole thing.