dogsheep / twitter-to-sqlite

Save data from Twitter to a SQLite database
Apache License 2.0
402 stars 21 forks source link

Archive import appears to be broken on recent exports #54

Open jacobian opened 3 years ago

jacobian commented 3 years ago

I requested a Twitter export yesterday, and unfortunately they seem to have changed it such that twitter-to-sqlite import can't handle it anymore 😢

So far I've ran into two issues. The first was easy to work around, but the second will take more investigation. If I can find the time I'll keep working on it and update this issue accordingly.

The issues (so far):

1. Data seems to have moved to a data/ subdirectory

Running twitter-to-sqlite import on the raw zip file reports a bunch of "not yet implemented" errors, and then exits without actually importing anything:

❯ twitter-to-sqlite import tarchive.db twitter.zip
...
data/manifest: not yet implemented
data/account-creation-ip: not yet implemented
data/account-suspension: not yet implemented
... (dozens of more lines like this, including critical stuff like data/tweets) ...

(tarchive.db now exists, but is empty)

Workaround: unpack the zip file, and run twitter-to-sqlite import tarchive.db path/to/archive/data

That gets further, but:

2. Some schema(s?) have changed

At least, the blocks schema seems different now:

❯ twitter-to-sqlite import tarchive.db archive/data
direct-messages-group: not yet implemented
branch-links: not yet implemented
periscope-expired-broadcasts: not yet implemented
direct-messages: not yet implemented
mute: not yet implemented
Traceback (most recent call last):
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/bin/twitter-to-sqlite", line 8, in <module>
    sys.exit(cli())
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/twitter_to_sqlite/cli.py", line 772, in import_
    archive.import_from_file(db, filepath.name, open(filepath, "rb").read())
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/twitter_to_sqlite/archive.py", line 215, in import_from_file
    to_insert = transformer(data)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/twitter_to_sqlite/archive.py", line 115, in lists_member
    return {"lists-member": _list_from_common(data)}
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/twitter_to_sqlite/archive.py", line 200, in _list_from_common
    for url in block["userListInfo"]["urls"]:
KeyError: 'urls'

That's as far as I got before I needed to work on something else. I'll report back if I get further!

jacobian commented 3 years ago

Correction: the failure is on lists-member.js (I was thrown by the block variable name, but that's just a coincidence)

jacobian commented 3 years ago

I was able to fix this, at least enough to get my archive to import. Not sure if there's more work to be done here or not.

henry501 commented 3 years ago

My import got much further with the applied fixes than 0.21.3, but not 100%. I do appear to have all of the tweets imported at least. Not sure when I'll have a chance to look further to try to fix or see what didn't make it into the import.

Here's my output:

direct-messages-group: not yet implemented
branch-links: not yet implemented
periscope-expired-broadcasts: not yet implemented
direct-messages: not yet implemented
mute: not yet implemented
periscope-comments-made-by-user: not yet implemented
periscope-ban-information: not yet implemented
periscope-profile-description: not yet implemented
screen-name-change: not yet implemented
manifest: not yet implemented
fleet: not yet implemented
user-link-clicks: not yet implemented
periscope-broadcast-metadata: not yet implemented
contact: not yet implemented
fleet-mute: not yet implemented
device-token: not yet implemented
protected-history: not yet implemented
direct-message-mute: not yet implemented
Traceback (most recent call last):
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/bin/twitter-to-sqlite", line 33, in <module>
    sys.exit(load_entry_point('twitter-to-sqlite==0.21.3', 'console_scripts', 'twitter-to-sqlite')())
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/twitter_to_sqlite/cli.py", line 772, in import_
    archive.import_from_file(db, filepath.name, open(filepath, "rb").read())
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/twitter_to_sqlite/archive.py", line 233, in import_from_file
    to_insert = transformer(data)
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/twitter_to_sqlite/archive.py", line 21, in callback
    return {filename: [fn(item) for item in data]}
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/twitter_to_sqlite/archive.py", line 21, in <listcomp>
    return {filename: [fn(item) for item in data]}
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/twitter_to_sqlite/archive.py", line 88, in ageinfo
    return item["ageMeta"]["ageInfo"]
KeyError: 'ageInfo'
danp commented 3 years ago

Similar trouble with ageinfo using 0.22. Here's what my ageinfo.js file looks like:

window.YTD.ageinfo.part0 = [
  {
    "ageMeta" : { }
  }
]

Commenting out the registration for ageinfo in archive.py gets my archive to import.

swyxio commented 1 year ago

as of 2023 it appears that tweets: not yet implemented is happening.. pretty important for a twitter export functionality!

this seems an impossible task with twitter liable to switch this around every other day