UB-Mannheim / RaiseWikibase

Knowledge graph construction: Fast inserts into a Wikibase instance
https://ub-mannheim.github.io/RaiseWikibase/
MIT License
45 stars 7 forks source link

Editing items fails when no ID provided #16

Closed MHuberFaust closed 2 years ago

MHuberFaust commented 2 years ago

Heya,

Editing an item with an exact ID, as shown in #15 works just fine. Doing it without it results in this:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/RaiseWikibase/RaiseWikibase/raiser.py in batch(content_model, texts, namespace, page_title, new)
    106         for ind, (text, pt) in enumerate(tqdm(zip(texts, page_title))):
--> 107             page(connection=connection, content_model=content_model,
    108                  namespace=namespace, text=text, page_title=pt, new=new)

~/RaiseWikibase/RaiseWikibase/raiser.py in page(connection, content_model, namespace, text, page_title, new)
     71     # 5. Find all IDs in different tables.
---> 72     [text_id, page_id, comment_id, content_id, rev_id] = connection.get_ids(new=new, page_title=page_title,
     73                                                                             namespace=namespace)

~/RaiseWikibase/RaiseWikibase/dbconnection.py in get_ids(self, new, page_title, namespace)
    335         if not new:
--> 336             [page_id, rev_id] = self.get_page_latest(page_title=page_title, namespace=namespace)
    337         text_id = self.get_text_id() + 1

~/RaiseWikibase/RaiseWikibase/dbconnection.py in get_page_latest(self, page_title, namespace)
    325         cur.execute(q)
--> 326         (page_id, page_latest) = cur.fetchall()[0]
    327         cur.close()

IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-6-f1420baf1598> in <module>
----> 1 batch('wikibase-item', items, new=False)

~/RaiseWikibase/RaiseWikibase/raiser.py in batch(content_model, texts, namespace, page_title, new)
    109         connection.conn.commit()
    110         connection.conn.close()
--> 111     except connection.conn.error() as error:
    112         print("Failed to update: {}".format(error))
    113         # reverting changes because of exception

TypeError: catching classes that do not inherit from BaseException is not allowed

Getting the ID of an item using lable seems to fail. It seems like the request made with get_page_latest() does not work correctly.

If you could point me in the right direction, that would be very helpful.

Best, Michael

shigapov commented 2 years ago

If the JSON representations in the items do not contain the key "id", then indeed it does not work out of the box.

You could try to test the method search_text_str() in dbconnection in order to find those QIDs.

Take a look how I used it to find the PIDs for properties using their labels in megaWikibase.

shigapov commented 2 years ago

If you upload N items and then you want to edit all of them, in principle you can just find the last QID via the method get_last_eid(content_model='wikibase-item') in dbconnection. Then all other N-1 QIDs are easy to compute.

MHuberFaust commented 2 years ago

Thank you, both approaches worked perfectly.