csensemakers / desci-sense

2 stars 2 forks source link

URL extraction fails on this tweet #31

Open ronentk opened 10 months ago

ronentk commented 10 months ago

Sub-issue from #24

https://twitter.com/victorveitch/status/1722300572554969090

Tweet contains arxiv link: image

Scrape tweet fails to extract it:

{'conversationID': '1722300572554969090',
 'date': 'Wed Nov 08 17:10:29 +0000 2023',
 'date_epoch': 1699463429,
 'hashtags': [],
 'likes': 1465,
 'mediaURLs': ['https://pbs.twimg.com/media/F-bW6ZnXsAABCu9.png',
  'https://pbs.twimg.com/media/F-bXIlSXgAEl61P.jpg'],
 'media_extended': [{'altText': None,
   'size': {'height': 671, 'width': 1645},
   'thumbnail_url': 'https://pbs.twimg.com/media/F-bW6ZnXsAABCu9.png',
   'type': 'image',
   'url': 'https://pbs.twimg.com/media/F-bW6ZnXsAABCu9.png'},
  {'altText': None,
   'size': {'height': 676, 'width': 1092},
   'thumbnail_url': 'https://pbs.twimg.com/media/F-bXIlSXgAEl61P.jpg',
   'type': 'image',
   'url': 'https://pbs.twimg.com/media/F-bXIlSXgAEl61P.jpg'}],
 'possibly_sensitive': False,
 'qrtURL': None,
 'replies': 18,
 'retweets': 224,
 'text': 'There\'s an idea that LLMs encode high-level concepts linearly in representation space, and that these can be understood using geometric operations (e.g., cosine similarity)   \n\nBut: What does "linear" even mean? And, why would (Euclidean) geometry encode meaning?… https://t.co/8WXQVI4WBb',
 'tweetID': '1722300572554969090',
 'tweetURL': 'https://twitter.com/victorveitch/status/1722300572554969090',
 'user_name': 'Victor Veitch',
 'user_screen_name': 'victorveitch'}