cmeng-git / atalk-android

xmpp/jabber client for android
Apache License 2.0
155 stars 56 forks source link

OMEMO session for device id -1 #193

Closed licaon-kter closed 1 year ago

licaon-kter commented 1 year ago

aTalk 2.9.5/2.9.6 F-Droid / Converse https://github.com/conversejs/converse.js/commit/0df1c1880e0e3f48f70ee9f8edf7b97581ea18b5 Fennec

Steps:

Server side DB (notice first device id number?):

ejabberd=# SELECT * FROM pubsub_node WHERE host = 'user@domain.tld' ORDER BY node;
       host       |                       node                       | parent | plugin | nodeid 
------------------+--------------------------------------------------+--------+--------+--------
 user@domain.tld | eu.siacs.conversations.axolotl.bundles:11095     |        | pep    |   1377
 user@domain.tld | eu.siacs.conversations.axolotl.bundles:3850      |        | pep    |   1378
 user@domain.tld | eu.siacs.conversations.axolotl.bundles:636623660 |        | pep    |   1379
 user@domain.tld | eu.siacs.conversations.axolotl.devicelist        |        | pep    |   1380
(4 rows)

ejabberd=# SELECT * FROM pubsub_item WHERE nodeid = '1380';
 nodeid |    itemid     |       publisher        |       creation       |     modification     |                                                     payload                                                      
--------+---------------+------------------------+----------------------+----------------------+------------------------------------------------------------------------------------------------------------------
   1380 | 67A4B19AB0F4D | user@domain.tld/user | 001658:307610:731732 | 001658:307610:731732 | <list xmlns='eu.siacs.conversations.axolotl'><device id='-1'/><device id='3850'/><device id='636623660'/></list>
(1 row)

Note that I saw this randomly testing aTalk (https://github.com/cmeng-git/atalk-android/issues/185#issuecomment-1188626882) and now I've repro this scenario on a new clean account.

PIng @jcbrand

/LE: Converse - user - OMEMO - remove other keys - fixes it

<device id='3850'/><device id='636623660'/></list>

...but on aTalk restart it's broken again

<device id='-1'/><device id='3850'/><device id='636623660'/></list>

...while the bundles list is unchanged.

cmeng-git commented 1 year ago

The error message indicates that there is an error in the OMEMO database record, where the fingerPrint table column contains no data i.e. null. Currently in OMEMO protocol implementations, there is NO defined standard method to deal with cleaning up the OMEMO PEP data that are resided on server database. Whenever the same user account is being deleted/created, the server will keep the old/obsoleted OMEMO pep records and create a new one.

All these OMEMO identities (new and obsoleted) will be send to the User whenever user log into the server. There is some mechanisms built into the smack OMEMO implementation to help remove the old OMEMO identities, but it is no fool proof. During aTalk development, it was found that the fingerPrint can accidentally be cleared to null. The OMEMO identities corruption happens on aTalk and other devices that support OMEMO e.g. conversations.

To resolve the data corruption, you can try to use the two aTalk built-in option i.e. a. "Purge unused identities" b. Regenerate OMEMO identities.

First try perform (a) to see it helps to clear the problem. If failed then perform (b).

omemo_option

Below are the all the discussions on OMEMO issues with smack development team during its implementation: https://discourse.igniterealtime.org/t/smack-omemo-4-4-0-encounters-null-object-reference-while-sending-an-omemo-message/89496/11 https://discourse.igniterealtime.org/t/smack-4-4-0-what-is-the-proper-way-for-app-to-handle-when-there-is-corrupted-omemokey-exception-in-user-own-omemo-device/89542/4 https://discourse.igniterealtime.org/t/smack-4-4-0-what-is-the-proper-way-for-app-to-handle-when-there-is-corrupted-omemokey-exception-in-user-own-omemo-device/89542 https://discourse.igniterealtime.org/t/smack-4-4-0-what-is-the-proper-way-for-app-to-handle-when-there-is-corrupted-omemokey-exception-in-user-own-omemo-device/89542 https://discourse.igniterealtime.org/t/smack-4-4-0-atalk-regenerate-omemo-identities-implementation-and-problem-faced/89565/8 https://discourse.igniterealtime.org/t/smack-4-4-0-atalk-regenerate-omemo-identities-implementation-and-problem-faced/89565 https://discourse.igniterealtime.org/t/smack-4-4-0-atalk-regenerate-omemo-identities-implementation-and-problem-faced/89565 https://discourse.igniterealtime.org/t/smack-4-4-0-smack-omemo-does-not-auto-repair-an-corrupted-identitykey-even-after-new-requested-bundle-received-from-server/89615 https://discourse.igniterealtime.org/t/smack-4-4-0-smack-omemo-does-not-auto-repair-an-corrupted-identitykey-even-after-new-requested-bundle-received-from-server/89615/5

licaon-kter commented 1 year ago

Whenever the same user account is being deleted/created, the server will keep the old/obsoleted OMEMO pep records and create a new one.

Not sure how other servers behave, but that's not true for ejabberd.

When I unregister the account, the nodes are deleted too. When I re-register (create basically) the same account there are no old nodes.

cmeng-git commented 1 year ago

To support OMEMO messaging, each newly registered android OMEMO device creates and saves its omemo data into ejabberd pubsub_node table on the server. The table contains the bundles and devicelist; the bundle row is identified with a bareJid and a deviceId. From the enclosed table below e.g. parrot@atalk.sytes.net, actually has only 2 active devices and the rest are inactive (stray left behind).

image

Each record of the omemoDevice also contains a nodeId that links to its other info in table pusub_item e.g. prekeys records etc.

When parrot@atalk.sytes.net account on an android device is removed, if ejabberd proceeds to clean up the omemo records for this account; it needs to remove all the active/inactive devices and its prekey tables based on bareJid only. This causes problem for the second android device registered with the same parrot@atalk.sytes.net, and the OMEMO messaging will fail. Therefore all xmpp servier will never attempt to perform this action on its own when an account is removed.

In aTalk implementation, it actually provides a routine to clean up the server omemo data for the affected omemoDevice. The function is activated when user perform:

    /**
     * Clean up omemo bundle and devicelist on the server for the specified omemoDevice:
     *
     * @param connection XMPPConnection
     * @param userJid Account userJid
     * @param omemoDevice of which the bundle and devicelist items are to be removed from server
     */
    private void purgeBundleDeviceList(XMPPConnection connection, BareJid userJid, OmemoDevice omemoDevice)
    {
        Timber.d("Purge server bundle and deviceList for old omemo device: %s", omemoDevice);
        PubSubManager pubsubManager = PubSubManager.getInstanceFor(connection, userJid);
        PepManager pepManager = PepManager.getInstanceFor(connection);

        // First refresh omemo devicelist on the server i.e. removed old id
        Set<Integer> deviceIds = Collections.emptySet();
        try {
            String nodeName = OmemoConstants.PEP_NODE_DEVICE_LIST;
            LeafNode leafNode = pubsubManager.getLeafNode(nodeName);
            if (leafNode != null) {
                List<PayloadItem<OmemoDeviceListElement>> items = leafNode.getItems();
                if (!items.isEmpty()) {
                    // These will completely remove the deviceList - mny not be good
                    // leafNode.deleteAllItems();
                    // pubsubManager.deleteNode(nodeName);

                    OmemoDeviceListElement publishedList = items.get(items.size() - 1).getPayload();
                    deviceIds = publishedList.copyDeviceIds();  // need a copy of the unmodifiable list
                    deviceIds.remove(omemoDevice.getDeviceId());
                }
            }
            OmemoDeviceListElement deviceList = new OmemoDeviceListElement_VAxolotl(deviceIds);
            pepManager.publish(nodeName, new PayloadItem<>(deviceList));

        } catch (SmackException | InterruptedException | XMPPException.XMPPErrorException e) {
            aTalkApp.showToastMessage(R.string.omemo_purge_inactive_device_error, omemoDevice);
        }

        // Only then purge omemo preKeys table/bundles on server
        try {
            String nodeName = omemoDevice.getBundleNodeName();
            LeafNode leafNode = pubsubManager.getLeafNode(nodeName);
            if (leafNode != null) {
                leafNode.deleteAllItems();
                pubsubManager.deleteNode(nodeName);
            }
        } catch (SmackException | InterruptedException | XMPPException.XMPPErrorException e) {
            aTalkApp.showToastMessage(R.string.omemo_purge_inactive_device_error, omemoDevice);
        }
    }
licaon-kter commented 1 year ago

Ok, not sure how that relates here, but my steps don't involve any "account removal" in the client. Clients were clean up via "clean app data" for Android and browser "delete site data".

I've mentioned that to make it clear that any nodes that exist are there from the used apps only. I've made a new account to specifically repro the issue.

Can you PM me via xmpp to give you a new account for you to test with?

cmeng-git commented 1 year ago

Just realised that all my comments after 22 July date, following your last comment, do not appear in this discussion thread. Do not understand why this happen.

In short, I need the actual debug log when the above problem occur to fully understand the cause of the problem. With which, I am unable to proceed further.

licaon-kter commented 1 year ago

Debug log of?

cmeng-git commented 1 year ago

I need the aTalk debug log that contains the process flow when the problem occur. Please refer to the following info to generate and send the aTalk debug log.

How can I give feedback to the developer for any aTalk problem encountered?

cmeng-git commented 1 year ago

Close due to no feedback for further actions.