Yang-Nankai / extspider

System for automated collection of browser extensions and malware detection.
2 stars 0 forks source link

Preparing for Reverse-Engineering of Spider #1

Open Yang-Nankai opened 5 months ago

Yang-Nankai commented 5 months ago

Before crawling extension data, we need to first reverse-engineer the communication between the browser and Google's servers. The communication happens in two relevant sections: The Category Dashboard and the Extension Details Page.

Reverse-Engineering the Category Dashboard ### How to get the extension information list. When the user scrolls down to the bottom of the page, extensions are not automatically loaded anymore. Instead, it is necessary to click the "Load more" button. Analysing the POST requests found that it was similar to the original one. ``` [[["zTyKYc","[[null,[[3,\"productivity/communication\",null,null,2,[32,\"QVNMeGY3RGhKMEVtb2F3em52akhnQlBWMkIyQjJvS3h5djZ6U1dqakUtNmhxdE1CdmVvRWhPR3U2bnRWN2FjMGp5Y2NlRkYxZDBTdWtmV0JYS1RQRjI5dDluY3h1R2dwMERpYy10Y0hvT3BISEZ4aWpyQnh5RXVHVFJENzB6NjFYMmxTeGdVRHFzRnNObkdaWE9nNzZjWTh0YzRoek5wYU0zTkFlZGhaalBDSklEUkdNb1VxY0hZQllKSjVFT2l3M1hCVUhad3VHSTB5cUJITkV5akwxUVplek16eGRXd2xxZUNBUWJuU2ZvY0ZJdkVMY2UzLUc5YVBIRnp1ckRzYzVSMng4SDRk\"]]]]]",null,"generic"]]] ``` The `f.req` parameter is what we need to focus on, here `zTykYc` is like a string describing the function, because I have seen similar strings at many interface. The `productivity/communication` represents which **category**. The new category is different from the original one. It is mainly divided into three major categories: Productivity, Lifestyle, Make Chrome Yours. Each major category includes many small categories. The `32` represents the number of extensions' information captured. Unlike the old version, the new version does not seem to be able to capture more than 150 pieces at a time(the old version limited to 500). The `QVNMeGY3RGhKMEVtb2F3em52akhnQlBWMkIyQjJvS3h5djZ6U1dqakUtNmhxdE1CdmVvRWhPR3U2bnRWN2FjMGp5Y2NlRkYxZDBTdWtmV0JYS1RQRjI5dDluY3h1R2dwMERpYy10Y0hvT3BISEZ4aWpyQnh5RXVHVFJENzB6NjFYMmxTeGdVRHFzRnNObkdaWE9nNzZjWTh0YzRoek5wYU0zTkFlZGhaalBDSklEUkdNb1VxY0hZQllKSjVFT2l3M1hCVUhad3VHSTB5cUJITkV5akwxUVplek16eGRXd2xxZUNBUWJuU2ZvY0ZJdkVMY2UzLUc5YVBIRnp1ckRzYzVSMng4SDRk` represents **the starting place**. In detailed terms, it means: number all the extensions in the database, like 1,2,3,4... and then when it first starts loading, if the above `limit` is 10, then the extensions 1-10 will be loaded, and then after we click **Load more**, the extensions 11-20 will be loaded later. The meaning of this string represents the starting position of 11, telling us which position we should start from to load extensions' info. After reverse engineering analysis, it was found that this string was not calculated locally, but was actually sent by the server with the last response. Therefore, when the 1-10 extensions' info was requested for the first time, the place was an empty string, and then returned then content contains a similar string, which must be included in the next request 11-20. During the actual test, this string is related to start(The meaning of start is explained above), so it can be reused, so that we can perform concurrent crawling. At the same time, this string may also be related to the timestamp, because for the same start, but this string is different, but fortunately, no matter which one can be used for a long time. In order to explain it more clearly, I will show it with pictures below. The following is the first time a certern category is requested: ![image](https://github.com/extensionsec/xcavate/assets/100209158/f06dc7d9-d33d-48f3-8513-a6bec38164a8) The result of the response packet obtained is (the string in the header is removed, and only an extension information is retained). ```json 15360 // the message size [ [ "wrb.fr", "zTyKYc", // Operator "[[[[null,null,null,null,null,[[[[[[\"majdfhpaihoncoakbjgbdhglocklcgno\",\"https://lh3.googleusercontent.com/Rm5hcKXvm9Prc-vyHzNGRpRVPxZAKQiiKPDNWW4Sn-MOm_-TxDOcKqNDpHUOYZBVidpnqWt22Wjwz9vtgW8nq-9Mrw\",\"Free VPN for Chrome - VPN Proxy VeePN\",4.714345920431557,11864,\"https://lh3.googleusercontent.com/ltTFtkBJPMbjBC5B7YcqmZOWCIXo_HXwkmK0rP1baQXVPFB-izgElxEOBnvTbgxHX4q-sNEwOzidy1vOnTRhli3xgA\",\"Fast, ultra secure, and easy to use VPN service to protect your privacy online. Enjoy Unlimited Traffic and Bandwidth!\",\"veepn.com\",true,null,null,[],1,null,6000000,true]]]]]]]]],null,[\"QVNMeGY3QlRaTlhOM0xlODhaSTNqYmttUUF0a2lMQVJKVVU5ZmhTSFBodGZna1BGZHBhVEsxWGxxbng2eHJRPQ\\u003d\\u003d\",null,null,44400]]", null, null, null, "generic" ], [ "di", 97 ], [ "af.httprm", 97, "-4476475136960508809", 6 ] ] ``` After cleaning up some irrelevant things and formatting: ```json [ [ [ [ null, null, null, null, null, [ [ [ [ [ [ "majdfhpaihoncoakbjgbdhglocklcgno", // extension id "https://lh3.googleusercontent.com/Rm5hcKXvm9Prc-vyHzNGRpRVPxZAKQiiKPDNWW4Sn-MOm_-TxDOcKqNDpHUOYZBVidpnqWt22Wjwz9vtgW8nq-9Mrw", // extension url "Free VPN for Chrome - VPN Proxy VeePN", // extension name 4.714345920431557, // rating score 11864, // rating count "https://lh3.googleusercontent.com/ltTFtkBJPMbjBC5B7YcqmZOWCIXo_HXwkmK0rP1baQXVPFB-izgElxEOBnvTbgxHX4q-sNEwOzidy1vOnTRhli3xgA", "Fast, ultra secure, and easy to use VPN service to protect your privacy online. Enjoy Unlimited Traffic and Bandwidth!", // description "veepn.com", // extension url true, null, null, [], 1, null, 6000000, //users true ] ] ] ] ] ] ] ] ], null, [ "QVNMeGY3Q3lSWXVIMDlpN0c3aXlyUXN0Q3NhbzRobmZ0T2VZTkt0enNtM1R3RDJtVkhISll2U0t2TWYwOHpwRDhJQzFsSmNxOTZhYVNsbU83dFVGbENXa2xKV2RaQThiNFlWdGd5b3dWTW1URmVWM0ExZGczNkpDRW55YzdEWHBna3AyXzlpVUg2czB4UEhaWDBRQTVhMC1xZVVlcW5OMmZpNS1fVGdidGRTenRNRjVkWlpKcVpQblF2YUtiVlhmZlNmVmEwdmhXaFRNOUxyOEhFbzRILWw4SDlRWVZBZXNsZ000TVE4WlRQLWNCaWFRelJVRjQ0ajN6Ni1QeWxZYUpLN09tVllhTGJnVjBnPT0\\u003d", // the next start null, null, 44400 ] ] ``` You can see that the Info messages obtained are less than before, but it is enough because we have another API —— detail to help us solve it. Then we click on the POST request package sent after Load more and observe its f.req parameter part. We can find that this string is the string sent in the last response! ![image](https://github.com/extensionsec/xcavate/assets/100209158/bd809231-73bd-47d4-afd1-e3f2f09060bd) **This is the page turning mechanism of Google Web Store.** And compared with the original version, what is more annoying is that the array returned by the response body is too nested, so we have to go through many re-indexes to find valid data, just like the following code: ```python token = parsed_next_array[2][0] infojson = parsed_next_array[0][0][0][5][0][0] ``` This is too ugly, so we need to implement a more convenient function. I have made a small Python Script to crawl and it works.
Reverse-Engineering the Extension Details Page ### How to get the extension details. The interface to get extension details is shown in the figure below: ![image](https://github.com/extensionsec/xcavate/assets/100209158/83a33ef3-e24b-461d-bdd2-02335c66f4b0) Actually, some parameters have no effect on the request, like the rpcids, source-path, _reqid, rt..., and the **f.req** is a list with four elements, but in fact, the last tree elements have no effect on getting the detail of extension. So there is a small script to get the detail: ```python import requests import os import json import io from collections import OrderedDict from requests.packages.urllib3.exceptions import InsecureRequestWarning requests.packages.urllib3.disable_warnings(InsecureRequestWarning) url = "https://chromewebstore.google.com/_/ChromeWebStoreConsumerFeUi/data/batchexecute?" \ "&f.sid=974938402045885243" \ "&bl=boq_chrome-webstore-consumerfe-ui_20231101.04_p0&hl=en&soc-app=1&soc-platform=1&soc-device=1&_reqid=560021" \ "&rt=c" headers = { "Host": "chromewebstore.google.com", "Content-Type": "application/x-www-form-urlencoded;charset=utf-8", } ext_req_data = '[[["xY2Ddd","[\\"knkpjhkhlfebmefnommmehegjgglnkdm\\"]",null,"1"]]]' def get_ext_item_reps(url, req_data): try: post_data = { 'f.req': req_data } print(post_data) response = requests.post(url, verify=False, headers=headers, data=post_data) res = response.text print(res) if response.status_code != 200: raise requests.RequestException(u"Status code error: {}".format(response.status_code)) if response.status_code == 200: return res except requests.RequestException as e: return False res = get_ext_item_reps(url, ext_req_data) print(res) ``` The result is: ```json [ [ "knkpjhkhlfebmefnommmehegjgglnkdm", // id "https://lh3.googleusercontent.com/oKryruCrQWAkoGeWAFLnYlFkNAsP7_LsC22EAA9-PbRqE_Jh1Q4OZoV4vE8CBW5p0LOCkHtnaruI9ovF7TXGE8fp", // url "Video Downloader professional", 4.293918918918919, // rating score 296, //rating count "https://lh3.googleusercontent.com/-AR2DrDB0h9ElGhjXxb_MW7148DtRaiypdfNq7Tho_kRFS2WwproRfAnsZwRuJXcHDqTQcnYpF1uL4cCW16VMONz9Q", // url "Download online videos in various formats from any websites. Video Downloader save video and watch it later.", // description null, null, true, null, [ "make_chrome_yours/functionality", null, 21 ], // category 1, true, 300000 // users ], null, null, null, null, [ [ 1, "https://lh3.googleusercontent.com/rMrvcN5IC4xg3qWt83y7Vy1naWypbQnc-uW5alFpTDQcLI1caZmoVQ-5ECUMvcTp6ML-rKpmfWgvKxYuVKhvpsuu0w" ], [ 1, "https://lh3.googleusercontent.com/Rvz1eNGHpXsqDHHag2BJSSUitSFrf3eGLiWQFydEXtsZVn_KG9fKXAcB4lOfdk-N2cerCoboo8V-ZWoYo0h84nuXgw" ], [ 1, "https://lh3.googleusercontent.com/xq_oQj6qR6xfJiA8yOfhLgfh7kmNJ-f8L0iY3E0FgIyLJU0XXFubOvbcvM2oZMJh95zITi-PDMn70smUM7zB2LDtGg" ], [ 1,"https://lh3.googleusercontent.com/HudtZhlHWvyKagyZmx9FbkQ7BRIRXqNNYT3CSUnly90zxRMRZjYF7MditvEdKKBvc7HkJtqHJ25i5_0Ru3TsXydBNss" ], [ 1, "https://lh3.googleusercontent.com/rsB6RrKXeTj7JO-nGr48XOXeyixYXSyPaeOPT0ZeFa29jaT-kN2rqwdCnEMoUYTI4kG5HxJZyO5CFsNz2OQ7rk18bA" ] ], // pictures "Video Downloader professional for fast downloading online videos from any webpages. The popular Chrome extension supports many video formats.\\n\\nWatch favorite moments with friends or show useful guides to business partners even if you aren't online. Save interested video files to review them later on your PC.\\n\\nYou can try Video Downloader professional and notice the difference! Use extension for free without any registration. Save in different available formats and qualities according to your needs.\\n\\nExtension adds \\"download\\" button to video page, video list or embedded video. Click \\"download\\" button and select quality of the video you are gong to download. \\n\\nHow video downloader works?\\n- After installing the extension, go to a website that contains video resources.\\n- All available video files that are on the page will be detected by the extension. Once found, an number mark is shown on the extension icon indicating that media file is available for download.\\n- A green Download button will appear on the video itself, along with all the available video file sizes.\\n- You just have to choose the size of the video you want to download.\\n- Nothing complicated!\\n\\nVideo Downloader provide you an easier way to find out where the actual media file is located on the server. \\n\\nChrome professional video downloader and music downloader, you can download videos. Free, secure and easy to use.\\nvideo downloader - Best m3u8 downloader Chrome extension to download m3u8 or audio in Chrome quickly and easily.\\n\\nUse this professional video downloader for Chrome browser to get video and audio from websites you like.\\n\\nIt can download videos of any formats, including MP4, FLV, f4v, hlv, webm, mov, mkv, and etc.\\n\\nAdd this video downloader extension to Chrome browser in one minute. Click on the icon of this extension on the target website, you will begin to download in no time. It's easy, safe and free. You will download video and audio fast and free. Try it. You will like this extension.\\n\\nSupport recording mode: videos that are difficult to download conventionally can be downloaded by recording, which means it can download almost any video from any website.\\n\\nIt allows you to keep your favorite live streams! You can now download live streams to your hard drive with this video downloader!\\n\\nYou do not need to register an account to use this video downloader. You can download any video or audio without register. This video downloader is for free to use. \\n\\nIt allows you to keep the live show you love! You can download the Live broadcasts in your disk now with this video downloader!\\n\\nDownload using popup icon:\\nAvailable for downloading video files are shown after clicking popup icon of the extension.\\nNumber shown on popup icon of the extension corresponds to q-ty of video available for downloading on page.\\n\\nNote: \\nVideo Downloader is not a Youtube Downloader. Due to restrictions of the Google Web Store Policies and Developer Program Policies we can not download Youtube Videos. Thank you for understanding.\\n\\nWe really hope that our Video Downloader will be useful to you! We look forward to your feedback and ratings! Also write your suggestions for improving the functionality.", // detial description null, true, 1, [ "romanfrancis9881@gmail.com", null, "https://sites.google.com/view/video-loader/privacy-policy", null, null, "video-loader.app" ], // developer information null, null, "1.0.5", // version [1674205068,681000000], // [timestamp, unknown] "138KiB", // file size ["Bahasa Indonesia","Bahasa Melayu","Deutsch","English","English (UK)","English (United States)","Filipino","Français","Kiswahili","Nederlands","Norsk","Tiếng Việt","Türkçe","català","dansk","eesti","español","español (Latinoamérica)","hrvatski","italiano","latviešu","lietuvių","magyar","polski","português (Brasil)","português (Portugal)","română","slovenský","slovenščina","suomi","svenska","čeština","Ελληνικά","Српски","български","русский","українська","עברית","فارسی\\u200e","मराठी","हिन्दी","বাংলা","ગુજરાતી","தமிழ்","తెలుగు","ಕನ್ನಡ","മലയാളം","ไทย","አማርኛ","\\u202bالعربية","中文 (简体)","中文 (繁體)","日本語","한국어"], // language null, null, null, "{\\n\\"update_url\\": \\"https://clients2.google.com/service/update2/crx\\",\\n\\n \\"version\\": \\"1.0.5\\",\\n \\"manifest_version\\": 3,\\n \\"name\\": \\"__MSG_name__\\",\\n \\"short_name\\": \\"__MSG_name__\\",\\n \\"description\\": \\"__MSG_desc__\\",\\n \\"default_locale\\": \\"en\\",\\n \\"icons\\": {\\n \\"128\\": \\"/img/128.png\\",\\n \\"64\\": \\"/img/64.png\\",\\n \\"32\\": \\"/img/32.png\\"\\n },\\n \\"action\\": {\\n \\"default_icon\\": \\"/img/128.png\\",\\n \\"default_popup\\": \\"popup.html\\"\\n },\\n \\"content_scripts\\": [\\n {\\n \\"matches\\": [\\"\\u003call_urls\\u003e\\"],\\n \\"js\\": [\\n \\"/js/lib/jquery-3.2.1.min.js\\",\\n \\"/js/lib/_config.js\\",\\n \\"/js/content.js\\",\\n \\"/js/feedback.js\\"\\n ],\\n \\"css\\": [\\n \\"/assets/btns.css\\",\\n \\"/assets/feedback.css\\"\\n ],\\n \\"all_frames\\": true\\n },\\n {\\n \\"matches\\": [\\"\\u003call_urls\\u003e\\"],\\n \\"js\\": [\\n \\"/js/content.js\\"\\n ],\\n \\"all_frames\\": true\\n }\\n ],\\n \\"background\\": {\\n \\"service_worker\\": \\"js/serviceWorker.js\\"\\n },\\n \\"permissions\\": [\\n \\"tabs\\",\\n \\"downloads\\",\\n \\"storage\\",\\n \\"webRequest\\"\\n ],\\n \\"host_permissions\\": [\\n \\"\\u003call_urls\\u003e\\"\\n ],\\n \\"web_accessible_resources\\": [\\n {\\n \\"resources\\": [\\n \\"img/*\\"\\n ],\\n \\"matches\\": [\\n \\"\\u003call_urls\\u003e\\"\\n ]\\n }\\n ]\\n}\\n", // manifest file true, [ [ "dkbccihpiccbcheieabdbjikohfdfaje", "https://lh3.googleusercontent.com/VmcqHAJnD22xrNXwfNOYnMeBNntmGVeHkTMZlnb30SggCbcwpFi3dC35LhXE1WF7kSxcWLhDj_LifsUMptxkTgcquUA", "Video Downloader for U",4.471698113207547,371,"https://lh3.googleusercontent.com/pVBEjIcIWQf50fi8pxbV-2AgZlevywMeSrQOZ4Avi4vETAu2sdfMoLve2KvB97QkV6JVCYxxRRDOBzKVF7Yihwh8IA", "Video downloader extension allows users to save videos from various online platforms in just a few clicks. Save videos for free.", "videounit.net", null, null, null, [], 1, null, 3000000, true ], [ "mdkiofbiinbmlblcfhfjgmclhdfikkpm", "https://lh3.googleusercontent.com/eIfpy02zm5RfJ1wNqv5ue0uEYOisRSwoZFDlw-liNOgLOGTlauA12Hz46r49FxIZDwfw4_I3QpkOtolLIc9lEo-IiMo", "Video downloader Plus", 4.127035830618892, 307, "https://lh3.googleusercontent.com/Baqb6SljS1_Y0sBuyjV0s4lP5Stq5eDTwm5ksstXEePe8RxOX4CW6IfCNviN9ABfWIPPadQbpXabn5-SiQHoH3a3EQs", "We offer a free online video downloader tool to download any video from the web instantly for free. Download video in a few clicks!", null, null, null, null, [], 1, null, 500000 ] ], // related extensions' information null, null, null, null, "88", null, 1, null, null, null, "https://sites.google.com/view/video-loader/privacy-policy", null, null, true ] ``` I don’t know the meaning of some parameters, but I’m sure we can also get extension detailed information, such as the contents of its manifest file. **Obtaining comments is the same as above.**
Yang-Nankai commented 5 months ago

TODO

yalogica commented 4 months ago

Some words about the extension details.

....
    null,
    null,
    "1.0.5",  // version
    [
         1674205068,   // it's the update date, timestamp format (you can check it here - https://www.epochconverter.com/)**
         681000000     // ???
     ],
    "138KiB",  // file size
...