AVee / cw_wiki_sync

Tool to parse data from chatwars wiki
4 stars 1 forks source link

Data duplication and out of date information #5

Closed Plagiatus closed 1 year ago

Plagiatus commented 1 year ago

This might be two different issues, but here goes.

I'm using your repository for some of my own stuff (generally great work btw), and I've come across an issue:

Data in the resources_v2.json file is duplicated and partially out of date (in fact it's an issue in both v1 and v2, but I guess v1 isn't as important anymore).

The item I noticed it with is a118 - it exists twice.

First entry

{
    "id": "a118",
    "eventItem": false,
    "name": "Indissoluble Cloak",
    "type": "Cape",
    "attack": 4,
    "defense": 11,
    "mana": 0,
    "levelRequirement": 60,
    "depositable": true,
    "auction": true,
    "enchantable": true,
    "craftable": true,
    "recipeIncomplete": false,
    "craftCommand": "/c_a118",
    "craftSkill": "Crafting (Skill)",
    "craftLevel": 6,
    "craftMana": 800,
    "recipe": [
        {
            "name": "Indissoluble Apron recipe",
            "qty": "1",
            "personalized": false
        },
        {
            "name": "Indissoluble Apron part",
            "qty": "4",
            "personalized": false
        },
        ...
    ],
    "pagename": "Indissoluble Apron",
    "lastModified": "2021-04-14T21:10:12Z",
    "revision": 16997,
    "wikiUrl": "https://chatwars-wiki.de/index.php?title=Indissoluble+Apron"
}

second entry

{
    "id": "a118",
    "eventItem": false,
    "name": "Indissoluble Cloak",
    "type": "Cape",
    "description": "Modifiers: Heavy Armor Mastery↑ 0.50 Per enchant",
    "attack": 4,
    "defense": 11,
    "mana": 0,
    "weight": 60,
    "levelRequirement": 60,
    "depositable": true,
    "shopSellPrice": 302,
    "exchange": false,
    "auction": true,
    "freeText": "Modifiers: Heavy Armor Mastery↑ 0.50 Per enchant",
    "quest": false,
    "enchantable": true,
    "enchantAtk1": 0,
    "enchantAtk2": 2,
    "enchantAtk3": 0,
    "enchantAtk4": 2,
    "enchantDef1": 2,
    "enchantDef2": 0,
    "enchantDef3": 2,
    "enchantDef4": 0,
    "enchantMana1": 0,
    "enchantMana2": 0,
    "enchantMana3": 0,
    "enchantMana4": 0,
    "craftable": true,
    "recipeIncomplete": false,
    "craftCommand": "/craft_a118",
    "craftSkill": "Crafting (Skill)",
    "craftLevel": 6,
    "craftMana": 800,
    "recipe": [
        {
            "name": "Indissoluble Cloak Recipe",
            "qty": "1",
            "personalized": false
        },
        {
            "name": "Indissoluble Cloak Part",
            "qty": "4",
            "personalized": false
        },
        ...
    ],
    "pagename": "Indissoluble Cloak",
    "lastModified": "2022-05-05T19:05:12Z",
    "revision": 17373,
    "wikiUrl": "https://chatwars-wiki.de/index.php?title=Indissoluble+Cloak"
}

As you can see, the first entry contains outdated data - calling the item "Indissoluble Apron" in everything but the "name" field - including referring to the Apron recipes as ingredients.


A local fix for others running into this issue could be to sort the data by lastModified or revision and only use the newer entry, at least that's what let me work around the issue for now.

AVee commented 1 year ago

All data in the file is taken directly from the wiki at https://chatwars-wiki.de/ so any errors in the data should be fixed there. I think the proper thing to do would be to remove one of the pages, but you may want to ask in Wiki Telegram group first at https://t.me/joinchat/AaDbq05AZVnNRqU0nojCWg. You can check https://chatwars-wiki.de/index.php?title=Contribute_to_the_CW-Wiki for more details.

Plagiatus commented 1 year ago

You're saying it's taken directly, but as a matter of fact the page that the first entry links to (https://chatwars-wiki.de/index.php?title=Indissoluble_Apron) doesn't exist (anymore).

So I'm guessing there was a page there at some point, then it was removed/moved/etc but remained in the scraped data.

AVee commented 1 year ago

I guess you're right. I haven't touched this code in years now, but it does some stuff caching previous results to avoid hammering the wiki server. I guess that doesn't pick up on deletions.

I have just regenerated the entire file from scratch, and the duplicate seems to be gone now.