PrismarineJS / mineflayer

Create Minecraft bots with a powerful, stable, and high level JavaScript API.
https://prismarinejs.github.io/mineflayer/
MIT License
4.95k stars 904 forks source link

Automatically extract information from http://minecraft.gamepedia.com/ ? #229

Closed rom1504 closed 9 years ago

rom1504 commented 9 years ago

http://minecraft.gamepedia.com/ is a really complete reference on many things on minecraft.

There are already some scripts (https://github.com/andrewrk/mineflayer/blob/master/bin/transform1_recipes.js for example, currently broken though) to extract the recipes from that wiki. And I think we could extract more things, for example everything that's on the infobox (see http://minecraft.gamepedia.com/Rabbit%27s_Foot vs https://github.com/andrewrk/mineflayer/blob/master/lib/enums/items.json#L950 )

I'm not sure this can really be applied here, but http://dbpedia.org/ has a really good framework to extract information from wikipedia infoboxes and the infoboxes from http://minecraft.gamepedia.com/ just look like the ones from Wikipedia so that might be interesting to look into.

@Kupferhirn has extracted the items manually from that wiki (see https://github.com/andrewrk/mineflayer/pull/227) and that's nice, but doing the same thing automatically would be really nice.

Edit: well I think the extraction framework of DBpedia is probably way to big for that, doing some simple scripts would be easier.

thejoshwolfe commented 9 years ago

An alternative to scraping a wiki is to install debug statements into the game itself. That would be guaranteed to be 100% correct and complete (at least for the mechanical data like id numbers), but it relies on the Minecraft Coder Pack project being caught up to the latest version of Minecraft. I can't really find any authoritative information on MCP anymore; I wonder if that project is still alive.

rom1504 commented 9 years ago

I think the official site of MCP is there http://www.modcoderpack.com/website/releases . Yeah I agree there are many ways to do it. For example @deathcap is working on upgrading burger (https://github.com/mcdevs/Burger/pull/12)

So I think whatever ways we can extract these infos automatically is fine.

roblabla commented 9 years ago

Relying on MCP is a bad idea. The project seems very volatile, sadly. A bukkit or forge plugin could also extract information and would seem more stable.

thejoshwolfe commented 9 years ago

I thought Forge was built on MCP. Maybe it used to be? If Forge works with 1.8.3, then that seems like the way to go.

What seems so attractive about a mod/plugin is that all the heavy data comes straight from Mojang. The only thing the community provides in this case is a scraping tool. The wiki is community maintained, and might be wrong. Bukkit is community maintained and might be wrong.

The downside of scraping the minecraft binary itself is that you don't always get very good string names and descriptions. Perhaps scraping would only be appropriate for recipes and a sanity check list of id numbers.

roblabla commented 9 years ago

Forge is built on MCP, but public builds of MCP take longer and longer to get released.

Bukkit is based on mojang's minecraft server, it can hardly be wrong. They use a similar technique as MCP, but do it themselves.

thejoshwolfe commented 9 years ago

Bukkit is based on mojang's minecraft server, it can hardly be wrong.

Bukkit currently doesn't know about Granite: https://github.com/Bukkit/Bukkit/search?utf8=%E2%9C%93&q=granite (contrast with: https://github.com/Bukkit/Bukkit/search?utf8=%E2%9C%93&q=acacia )

Bukkit, like the wiki, is supposed to be kept up to date by the community. This makes it inherently less trustworthy than the actual data in the notchian game itself, which we know must be right at all times by definition.

A Forge plugin still seems like the most reliable solution to me at this point.

roblabla commented 9 years ago

This is the wrong repo. Bukkit repo's last commit is in 2014 august. Spigot is still up-to-date and does know about granite, prismarine, etc...

thejoshwolfe commented 9 years ago

This is the wrong repo.

Oh ok. Where do we get the current source? Or are you proposing we write a Bukkit plugin to dump the data from the Bukkit runtime binary?

roblabla commented 9 years ago

Yes, that's what I was proposing. A forge plugin works too though.

Current source is closed due to the DMCA stuff

rom1504 commented 9 years ago

I started fixing the recipes extractor. And as expected : not all the blocks info are correct, for example the "Trapdoor" https://github.com/andrewrk/mineflayer/blob/master/lib/enums/blocks.json#L1096 is now named "Wooden Trapdoor" (http://minecraft.gamepedia.com/Trapdoor#Crafting)

I think there are many other such errors, that's why some kind of automatic extractor is needed for this.

I will still update the recipes but it won't be perfect until we have an extractor for the blocks and the items (the recipes extractor depend on having correct items.json and blocks.json)

rom1504 commented 9 years ago

I'm currently extracting from the html of http://minecraft.gamepedia.com/Crafting#Complete_recipe_list but it's not very reliable (or easy). Getting the wiki source of that might be useful, I didn't find how to do that for the complete list, but it's possible for a single item (for example http://minecraft.gamepedia.com/index.php?title=Andesite&action=edit&section=3) which might be easier to parse. To use the individual pages it would be needed to get them all : should be integrated in the script.

The wiki source is generally much easier to parse than the html, and it might be possible to parse the items and blocks information from it (see the source of the infobox there http://minecraft.gamepedia.com/index.php?title=Andesite&action=edit)

Edit: apparently the complete list is generated with a script like that http://minecraft.gamepedia.com/Module:Recipe_list , this might be useful

Edit2: there's a "Pocket Edition only" or "Console edition only" note on some of the recipes, check that on the script (and remove the recipes that shouldn't have been added if needed)

Kupferhirn commented 9 years ago

"trapdoor" is the unlocationed name from the notchian client. I have checked all block that could have changed

rom1504 commented 9 years ago

@Kupferhirn "name": "trapdoor", is ok , the problem is "displayName": "Trapdoor", And other similar stuff (I think most blocks/items with a different qualifiers like this have problem at least in the displayName) And I need the display name to be coherent in my script to extract the recipes.

I don't have it right now, but I'll put here a list of blocks/items with problems tonight if that can be useful.

rom1504 commented 9 years ago

So I found out a bit more about these recipe-related scripts :

rom1504 commented 9 years ago

See this http://minecraft.gamepedia.com/Talk:Crafting#Wiki_source_of_the_recipes

rom1504 commented 9 years ago

The recipes of the furnace are there http://minecraft.gamepedia.com/Smelting

For the brewing stand : http://minecraft.gamepedia.com/Brewing

see http://minecraft.gamepedia.com/Template:Grid#Other_templates for various grid-related pages.

rom1504 commented 9 years ago

this should somehow go in https://github.com/PrismarineJS/minecraft-data

rom1504 commented 9 years ago

I think I might just start by making a script to get the wiki source of everything on the wiki, because there is a lot of information on it, not just recipes.

pokeball99 commented 9 years ago

Or have said info hosted on a new repo,aND get it to draw info from it

rom1504 commented 9 years ago

@pokeball99 that's already done there but we still need to extract minecraft info to put it in minecraft-data ;)

rom1504 commented 9 years ago

Ok, this issue https://github.com/PrismarineJS/minecraft-data/issues/8 tracks the progress for the wiki extraction. If someone want to work on extraction from burger, mcp or whatever else, he can open an issue on the minecraft-data repo.

Closing this issue.