UB-Mannheim / RaiseWikibase

Knowledge graph construction: Fast inserts into a Wikibase instance
https://ub-mannheim.github.io/RaiseWikibase/
MIT License
45 stars 7 forks source link

Editing items with `page` argument `new=False` #15

Closed Fab-Hop closed 2 years ago

Fab-Hop commented 2 years ago

I am trying to use the page function provided in the raiser module to edit existing items by setting the new argument to False. However this seems to not work at all.

My code looks like this:

labels = {**RaiseWikibase.datamodel.label('en', 'foo_1'), **RaiseWikibase.datamodel.label('de', 'foo_de')}
snak = RaiseWikibase.datamodel.snak(datatype="wikibase-item" , value="Q1" , prop="P1", snaktype="value")
claims = RaiseWikibase.datamodel.claim(prop="P1", mainsnak=snak, qualifiers={}, references={})
items = [RaiseWikibase.datamodel.entity(labels=labels, aliases={}, descriptions={}, claims=claims,etype='item')]
items[0]["id"]="Q4" # added to identify item which should be changed
RaiseWikibase.raiser.batch(content_model="wikibase-item", texts=items, new=False)

Setting new=True creates a correct new item. Indicating that the Wikibase data format should be fine. Given that the new=False option causes in any case an issue for items and properties the page function might not be intended to change these types of pages. In that case this issue can be solved by using a completely different workflow to edit existing items and properties. If so a pointer towards a description of this workflow would help me a lot. If the page function is intended to be used to edit existing items and properties that part of the page function seems to contain a few bugs.

Firstly, the local variable new_eid variable isn't assigned but used when trying to write the fingerprint into the secondary tables. However, after fixing this trying to write the fingerprint doesn't work either, because the connection.insert_secondary([...]) method throws an MySQLdb._exceptions.IntegrityError: (1062, "Duplicate entry '5-1' for key 'wbt_item_terms_term_in_lang_id [...] exception if the same fingerprint or one with different values is used. Additionally, using an empty fingerprint runs without an error message, but corrupts the item page (most likely due to removing the whole fingerprint). In general using page with new=False seem to leave the MediaWiki database in an inconsistent state.

I hope this error can be reproduced with the help of this description. Please, let me know if you need additional information. I am happy to help with fixing this issue.

Best, Fabian

shigapov commented 2 years ago

Hi Fabian!

Thank your for the detailed issue! Indeed my use case was only initial data upload with RaiseWikibase, so I didn't care much about new=False. But if you are open to work a bit on this issue, I would be happy to help you too.

Before handling with the secondary tables the problem with rev_id needs to be solved. It was mentioned in Issue 10 "max value for rev_id in the revision table". So we need to rewrite the method get_rev_id for getting rev_id while editing an entity. It should depend probably on page_id like the method get_old_lendata. In the revision table rev_page is equal to page_id.

Let's continue tomorrow.

shigapov commented 2 years ago

I've uncommented the debugger options in LocalSettings.php.template and edited an item. At the item page I see the message Notice: Page Item:Q1 exists but has no (visible) revisions! [Called from WikiPage::{closure} in /var/www/html/includes/page/WikiPage.php at line 648] in /var/www/html/includes/debug/MWDebug.php on line 430. Indeed a problem with revisions.

Fab-Hop commented 2 years ago

Thank you for confirming that this is an issue and mentioning the best starting point to look closer into this. My experience with the database schema of MediaWiki is limited but I'll take a closer look into it the rev_id.

shigapov commented 2 years ago

Regarding the duplicate fingerprint: let's first add IGNORE to the line 412 like in the line line 414. Could you reproduce your experiment with that?

shigapov commented 2 years ago

It seems the comment_id in revision-table is depricated (P.S: in fact it's not, but it has caused the problem). I've removed it. It seems the current implementation of rev_id is fine for your example.

The current code works with the following example which creates one property & one item and then edits the item:

from RaiseWikibase.datamodel import label, alias, description, snak, claim, entity
from RaiseWikibase.raiser import batch

p = entity(labels=label(value='Wikidata ID'),
           aliases=alias(value=["WID", 'WikidataID']),
           descriptions=description(value="ID of an entity in Wikidata"),
           claims={},
           etype='property',
           datatype='external-id')

batch('wikibase-property', [p])

e = entity(labels=label(value='human'),
           aliases={},
           descriptions={},
           claims={},
           etype='item')

batch('wikibase-item', [e])

m = entity(labels=label(value='human'),
           aliases=alias(value=['person']),
           descriptions=description(value='a human being'),
           claims=claim(prop='P1',
                        mainsnak=snak(datatype='external-id',
                                      value='Q5',
                                      prop='P1',
                                      snaktype='value')),
           etype='item')
m["id"]="Q1"

batch('wikibase-item', [m], new=False)

Please let me know whether this works for you.

Fab-Hop commented 2 years ago

Sorry, I just found the time to test it today. It works perfectly now.

Thank you very much for the fast bug fix.