Few remarks on the implement of `GM_xmlhttpRequest`

WhiteSevs commented 1 year ago

希望不使用fetch来实现GM_xmlhttpRequest，因为有时候同源请求需要设置headers的User-Agent时，也是属于跨域了，使用fetch的话不会生效该User-Agent；
提高兼容性，当method为GET时，如果details中存在data键，那么设置把method为POST，不然https://github.com/JingMatrix/ChromeXt/blob/master/app/src/main/assets/GM.js的第381行会报错
```
request = new Request(details.url, {
    cache: "force-cache",
    body: details.data, // 会报错
    ...details,
});
```
GM_cookie未实现，希望可以在脚本环境中删除该API，在·TamperMonkey·中的GM_cookie是ojbect类型；
是否考虑在iframe内注入？webview可以hook接口shouldInterceptRequest，参数WebResourceRequest对它返回的html进行修改加入js；
GM_xmlhttpRequest似乎并没有对请求自动加入Cookie，比如我对一个api进行请求登录，返回的headers里有Cookie信息，后续的同域名请求并没有把Cookie加进去；

JingMatrix commented 1 year ago

Thanks for your interest in ChromeXt and valuable suggestions given here.

Your first two points are now implemented in 90ceb889e652503935c8706f69759eb3bb6a31fe. However, I shall keep using the fetch API since it can save the network-traffic when a UserScript is trying to get response data of current page resources. Moreover, it is faster than the Java native implement since there is a data conversion layer between Java and JavaScript.

For the third point, I shall add it to the dev plan and implement it soon.

For the fourth point, it is not an easy task for chromium based browsers, so it has lower priority in my plan. A possible implement requires CDP. Everyone is welcome to contribute to this part. Please refer to the following codes if you are interested: https://github.com/JingMatrix/ChromeXt/blob/90ceb889e652503935c8706f69759eb3bb6a31fe/app/src/main/java/org/matrix/chromext/devtools/WebSocketClient.kt#L70-L76

For the fifth point, I personally think that it is the responsibility of the UserScript to handle the relevant Cookie headers. This is simply because that my implement of GM_xmlhttpRequest is stateless: no data of previous responses are stored anywhere.

WhiteSevs commented 1 year ago

关于第一点，不光User-Agent，还有设置Referer、Host、Origin有时候也很重要的，不知道使用fetch能不能设置我曾在代码中使用jQuery的$.ajax来代替过GM_xmlhttpRequest

let headers_options_key = [
          "Accept-Charset",
          "Accept-Encoding",
          "Access-Control-Request-Headers",
          "Access-Control-Request-Method",
          "Connection",
          "Content-Length",
          "Cookie",
          "Cookie2",
          "Date",
          "DNT",
          "Expect",
          "Host",
          "Keep-Alive",
          "Origin",
          "Referer",
          "TE",
          "Trailer",
          "Transfer-Encoding",
          "Upgrade",
          "User-Agent",
          "Via",
        ]

WhiteSevs commented 1 year ago

Thanks for your interest in ChromeXt and valuable suggestions given here.

Your first two points are now implemented in 90ceb88. However, I shall keep using the fetch API since it can save the network-traffic when a UserScript is trying to get response data of current page resources. Moreover, it is faster than the Java native implement since there is a data conversion layer between Java and JavaScript.

For the third point, I shall add it to the dev plan and implement it soon.

For the fourth point, it is not an easy task for chromium based browsers, so it has lower priority in my plan. A possible implement requires CDP. Everyone is welcome to contribute to this part. Please refer to the following codes if you are interested:

https://github.com/JingMatrix/ChromeXt/blob/90ceb889e652503935c8706f69759eb3bb6a31fe/app/src/main/java/org/matrix/chromext/devtools/WebSocketClient.kt#L70-L76

For the fifth point, I personally think that it is the responsibility of the UserScript to handle the relevant Cookie headers. This is simply because that my implement of GM_xmlhttpRequest is stateless: no data of previous responses are stored anywhere.

感谢修复，待会儿会去试一下最新版

JingMatrix commented 1 year ago

There is a chroimum bug tracker for the User-Agent issue. A priori, we are not sure which headers are not effective in the fetch API of chromium. Hence, I'd suggest that we change the condition for fetch API when new related bugs are reported. Currently, in your UserScript, are there other headers must be changed?

WhiteSevs commented 1 year ago

There is a chroimum bug tracker for the issue. A priori, we are not sure which headers are not effective in the API of chromium. Hence, I'd suggest that we change the condition for API when new related bugs are reported. Currently, in your UserScript, are there other headers must be changed?User-Agent``fetch``fetch

搜集了一下，我的脚本目前使用的headers有

Accept
Authorization
Accept-Encoding
Accept-Language
Content-Type
Host
Origin
Pragma
Referer
x-csrf-token
X-Requested-With

另外关于这个有些建议90ceb88237行，对details.headers中的key进行小写/大写转换判断，因为可能会有不规范写法，如user-agent或者user-Agent又或者uSer-aGent

JingMatrix commented 1 year ago

Now ChromeXt will take all forbidden headers listed on MDN into consideration. Moreover, you can specify the forceCORS option, see b7b023e9331564ba1165e409e8e36a3f833ca973 .

No worry about the upper or lower spelling cases of the header keys, the Headers API of JavaScript can take care of them automatically.

WhiteSevs commented 1 year ago

刚刚发现一个新bug，当用户的details.responseType期望为json时，此刻的返回的response.responseText内其实是html，这时候会解析失败

JingMatrix commented 1 year ago

Could you give an exmaplar URL for the above bug? Maybe you missed a content-type header.

WhiteSevs commented 1 year ago

https://up.woozooo.com/mlogin.php 用于蓝奏云网盘登录 👇是details结构

JingMatrix commented 1 year ago

I cannot reproduce it using cURL. Even without any specific headers, the response is still a JSON string:

curl -v 'https://up.woozooo.com/mlogin.php' --data-raw 'task=3&uid=jingmatrix%40gmail.com&pwd=test&setSessionId=&setSig=&setScene=&setToken=&formhash=ab2489d6'

The Java implement of GM_xmlhttpRequest should be the same as cURL. Could you also show the response data? Maybe you can expand the promise, and find the response data.

WhiteSevs commented 1 year ago

我使用TamperMonkey的请求复现了一下，reponseText并不是json而是html，它的response直接为undefined

JingMatrix commented 1 year ago

The finalUrl indicates that your request didn't follow the redirect correctly. You may retry with the URL https://up.woozooo.com/mlogin.php.

If TamperMonkey cannot response JSON data, then it seems not to be a bug of script manager.

WhiteSevs commented 1 year ago

该请求为302重定向跳转到了https://up.woozooo.com/myfile.php，导致了response.responseText内容应该是json变成了html， ·TamperMonkey·对response.reponse做的处理是不进行JSON.parse，ChromeXt也需要在上面进行兼容性处理，不然ChromeXt的data.response = JSON.parse(data.responseText);就会执行失败导致无法调用脚本的onloadCallBack

刚刚发现一个新bug，当用户的期望为时，此刻的返回的内其实是，这时候会解析失败details.responseType``json``response.responseText``html

JingMatrix commented 1 year ago

It is wired about the redirection, because cURL tells me that https://up.woozooo.com/myfile.php redirects to ./mlogin.php, but you claimed the reverse.

I think when you specify responseType, you are expecting a JSON reponse. And if such response is not available, an error should be thrown. Hence, I will reject the Promise (after catching it in JSON.parse) with unparsed response. Do you think it is more reasonable?

WhiteSevs commented 1 year ago

很合理，但是如果resolve的话兼容性会更好一些

It is wired about the redirection, because cURL tells me that https://up.woozooo.com/myfile.php redirects to ./mlogin.php, but you claimed the reverse.

I think when you specify responseType, you are expecting a JSON reponse. And if such response is not available, an error should be thrown. Hence, I will reject the Promise (after catching it in JSON.parse) with unparsed response. Do you think it is more reasonable?

JingMatrix commented 1 year ago

In the commit aba8e9dd1280722b71335f71475f5885e1b856c6, I throw the parsing error out. Please tell me if this solution is acceptable for you. I think that unless you are using the async version GM.xmlHttpRequest, your code won't stop when there is a promise error.

Also, I implemented the basic APIs of GM_cookie in 567dcddb81046d1bcec564aa6f65a6b7ac3a13ed, but I am not very familar with the usage case of it. Could you please give some usage scenarios? So that I am more aware of what kind of functionalities should be included.

JingMatrix commented 1 year ago

For your fifth point, I found a way to implement it in Java.

Now the anonymous options fully control if the HTTP requests are stateless.

WhiteSevs commented 1 year ago

In the commit aba8e9d, I throw the parsing error out. Please tell me if this solution is acceptable for you. I think that unless you are using the async version GM.xmlHttpRequest, your code won't stop when there is a promise error.

Also, I implemented the basic APIs of GM_cookie in 567dcdd, but I am not very familar with the usage case of it. Could you please give some usage scenarios? So that I am more aware of what kind of functionalities should be included.

试了一下相较之前的并没太大区别，虽然在ChromeXt内抛出了JSON.parse错误，但是脚本的onload并未触发，我删除了自己脚本的responseType: "json"，自己在onload返回内对response.responseText进行了处理。

下面是一些我使用的GM_cookie的例子


// 用来获取用户是否登录的Cookie，该Cookie是HttpOnly，document.cookie无法获取到
function getCookie(cookieName) {
    return new Promise((resolve) => {
      GM_cookie.list({ name: cookieName }, function (cookies, error) {
        if (error) {
          resolve(null);
        } else {
          if (cookies.length == 0) {
            resolve(null);
          } else {
            resolve(cookies[0].value);
          }
        }
      });
    });
  }

await getCookie("userLogin")

WhiteSevs commented 1 year ago

我又找到一个关于response.responseText编码问题

GM_xmlhttpRequest({
    url:"https://tieba.baidu.com/f/search/res?isnew=1&kw=%C4%E6%CB%AE%BA%AE%CA%D6%D3%CE&qw=%CE%E8%D1%F4%B3%C7&un=&rn=10&pn=0&sd=&ed=&sm=1",
    method:"get",
    headers: {
    Referer: "https://tieba.baidu.com/f?ie=utf-8&kw=%E9%80%86%E6%B0%B4%E5%AF%92%E6%89%8B%E6%B8%B8",
    Host: "tieba.baidu.com",
    Accept:
      "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
  },
  responseType: "html",
    onload:(resp)=>{console.log(resp)}
})

在`TamperMonkey`中正常解码👇

在`ChromeXt`中乱码👇

猜测是content-type: text/html; charset=GBK的缘故？

JingMatrix commented 1 year ago

Thanks for the feedbacks!

Now parsing error won't block onload.
To get httpOnly cookie, I should use CDP. This can be done later.
You are right, I was assuming UTF-8 encoding. Should be fixed soon.

WhiteSevs commented 1 year ago

Thanks for the feedbacks!

Now parsing error won't block onload.

To get httpOnly cookie, I should use CDP. This can be done later.

You are right, I was assuming UTF-8 encoding. Should be fixed soon.

关于第2点，我在TamperMonkey中测试了一下，当我在headers里设置了{cookie:"xxxxxxx"}，它会自动将属于该domain的为HttpOnly的Cookie自动添加到Cookie前面例如，当前domain下的cookie有

key	value	domain	path	expires	...	HttpOnly
github_test	1	github.com	/	....
github_test2	2	github.com	/	....		√

// 我设置的cookie

headers: {
  "user-agent":".....",
   cookie: "github_test=1;",
}

发出去的Cookie会变成 Cookie: github_test2=2;github_test=1 也就是，Cookie前面的是当前的，后面的属于用户设置的。当我尝试通过抓包通过lspatch打包的ChromeXt时，发现请求头存在多个Cookie键，且我设置的Cookie是小写的cookie

JingMatrix commented 1 year ago

httpOnly cookie is not supported yet, but will be done soon. Once it is implemented, httpOnly cookie headers will be appended. As for the given screenshot, is it a CORS request? Could you please describe what should be the expected behavior? If I am correct, in the API, cookie should be set as a property of details instead of headers.

WhiteSevs commented 1 year ago

httpOnly cookie is not supported yet, but will be done soon. Once it is implemented, httpOnly cookie headers will be appended. As for the given screenshot, is it a CORS request? Could you please describe what should be the expected behavior? If I am correct, in the API, cookie should be set as a property of details instead of headers.

对，是跨域请求，修改了user-agent，该请求的作用是发送一个签到请求，需要Cookie验证当前身份是否和url中的formhash匹配，cookie确实应该是放在details中而非headers内，等后面实现HttpOnly我再试试，现在的话我发送签到请求返回内容会是验证身份失败

JingMatrix commented 1 year ago

Now httpOnly support is implemented. Please test it and share your feedbacks! :smiley:

WhiteSevs commented 1 year ago

Now httpOnly support is implemented. Please test it and share your feedbacks! 😃

我尝试了一下最新的https://github.com/JingMatrix/ChromeXt/actions/runs/6228644262，似乎有严重的问题，GM_xmlhttpRequest的请求不触发任何回调

JingMatrix commented 1 year ago

Cannot reproduce your issue, in my devices,

GM_xmlhttpRequest({
    url: "https://bbs.binmt.cc/k_misign-sign.html?operation=qiandao&format=button&formhash=TESTHASHVALUE&inajax=1&ajaxtarget=midaben_sign",
    onload: (r)=>console.log(r.response),
    onerror: (r)=>console.log(r),
    timeout: 5000,
    forceCORS: true
})

returns a response with 您当前的访问请求当中含有非法字符，已经被系统拒绝.

WhiteSevs commented 1 year ago

Cannot reproduce your issue, in my devices,
GM_xmlhttpRequest({
    url: "https://bbs.binmt.cc/k_misign-sign.html?operation=qiandao&format=button&formhash=TESTHASHVALUE&inajax=1&ajaxtarget=midaben_sign",
    onload: (r)=>console.log(r.response),
    onerror: (r)=>console.log(r),
    timeout: 5000,
    forceCORS: true
})
returns a response with 您当前的访问请求当中含有非法字符，已经被系统拒绝.

使用这个仍是未触发，GM_bridge.GM_xmlhttpRequest使用的是这个调试https://greasyfork.org/zh-CN/scripts/475424-%E8%B0%83%E8%AF%95

https://github.com/JingMatrix/ChromeXt/assets/50544447/52bf210e-d024-484d-abd2-1fc351dc25db

JingMatrix commented 1 year ago

I still cannot reproduce it with Via. Maybe you need to set breakpoints to see why this part of code https://github.com/JingMatrix/ChromeXt/blob/523eed33da5112577700f5e5cabaa991790c4605/app/src/main/assets/GM.js#L493-L496 is not triggered.

The onload function should be trigger by this call: https://github.com/JingMatrix/ChromeXt/blob/523eed33da5112577700f5e5cabaa991790c4605/app/src/main/assets/GM.js#L793

WhiteSevs commented 1 year ago

我找到bug了，并不是这儿，而是await GM_cookie.list()，它并没有返回，所以一直处于等待中。 https://github.com/JingMatrix/ChromeXt/blob/523eed33da5112577700f5e5cabaa991790c4605/app/src/main/assets/GM.js#L625

跟进 https://github.com/JingMatrix/ChromeXt/blob/523eed33da5112577700f5e5cabaa991790c4605/app/src/main/assets/GM.js#L145-L147

似乎监听未触发 https://github.com/JingMatrix/ChromeXt/blob/523eed33da5112577700f5e5cabaa991790c4605/app/src/main/assets/GM.js#L120-L121

https://github.com/JingMatrix/ChromeXt/assets/50544447/9a37cbb8-9733-483b-88d8-c448c559f468

JingMatrix commented 1 year ago

Thanks for reporting this bug, it is fixed now. This is a bug about thread safe usage of WebView class, see details in 666253121bde33669cf9679ac66c1c427dba3d98.

WhiteSevs commented 1 year ago

Thanks for reporting this bug, it is fixed now. This is a bug about thread safe usage of WebView class, see details in 6662531.

刚刚尝试了最新版，似乎未修复该问题

JingMatrix commented 1 year ago

刚刚尝试了最新版，似乎未修复该问题

That seems impossible, on my device, GM_bridge.GM_cookie.list().then(a => console.log(a)) returns correctly the cookies of Via. Maybe, your Via is not fully restarted?

While by contrast, with the commit before it, Via could not return the cookie.

WhiteSevs commented 1 year ago

是的，现在可以工作了，应该是我没重启via的问题，在测试签到请求时，对比发现，Cookie的添加方式似乎有问题。存在两个Cookie在请求头中，且，第一个Cookie中的cQWy_2132_saltkey=Nlp...，这个是当前真正的，第二个Cookie中的cQWy_2132_saltkey=toW...是错误的，猜测是之前的？ {CF1DCD82-7645-408f-9106-49272FDBE7B2}

且发出去的Cookie还有缺失，使用GM_cookie.list和DevTools中进行比对是完全一致的

WhiteSevs commented 1 year ago

排查了一下，似乎是ChromeXt对details进行了一些处理，自动加上了cookie: "cQWy_2132_saltkey=Nlp..."，cQWy_2132_saltkey是HttpOnly这是对的Cookie，然后JSON.stringify发送给Java做了个后续的添加？

JingMatrix commented 1 year ago

Yes, you are right. The problem in Java part can be tricky. By the way, which part of cookies is misssing from the first Cookie header? GM_cookie.export(location.origin) returns the added HttpOnly cookie.

WhiteSevs commented 1 year ago

第一个Cookie中缺失所有非HttpOnly的Cookie，所以服务器无法验证当前请求的身份。 GM_cookie.export(location.origin)存在问题，返回空值

JingMatrix commented 1 year ago

I suppose that we should only add httpOnly cookies since they are the ones not accessible by document.cookie, isn't that true? Does TamperMonkey include all of them?

WhiteSevs commented 1 year ago

TamperMonkey会处理所有的Cookie，包括httpOnly和document.cookie的，因此使用者大部分情况下不太关注Cookie的使用问题

JingMatrix commented 1 year ago

For WebView browsers, the cookie management should be Okay now.

However, for Chromium based browsers, the UserScript should take care of Set-Cookie headers if the request has the same domain of current page.

JingMatrix commented 1 year ago

There was a mistake in the previous commit. Now the cookie problem should be solved gracefully.

WhiteSevs commented 1 year ago

我试了一下，签到功能正常了，还有其它问题，比如，当前我在https://bbs.binmt.cc使用API向https://www.z4a.net/图床发送了登录请求，登录成功，但是ChromeXt并未存储相关Cookie？后面的上传图片就验证Cookie失败了，然后打开新标签也进https://www.z4a.net/图床进行登录账号，再返回https://bbs.binmt.cc内上传图片，这时候验证Cookie成功了，也就是说，通过GM_xmlhttpRequest无法进行这种跨域名登录的操作。

在TamperMonkey的流程为👇 当前在域名bbs.binmt.cc下 => 跨域登录www.z4a.net => 自动保存登录www.z4a.net的Cookie => 上传图片 => 获取上传图片的结果

WhiteSevs commented 1 year ago

可以使用这个脚本https://greasyfork.org/zh-CN/scripts/475722-greasyfork%E4%BC%98%E5%8C%96，用于在greasyfork.org上的，自动登录账号，录入账号功能在菜单中

这个也是跨域，用到了headers.referer

JingMatrix commented 1 year ago

The function you asked is the same as tracking. It might be that the (WebView) browser has it own setting against various tracking. Changing this setting might incure blames from users, so I prefer not to change it. Hence, I suggest you to store the cookie in your own scripts only and send them by setting the cookie parameter. ChromeXt provides two helper funtions for you to parse headers and export cookies: https://github.com/JingMatrix/ChromeXt/blob/9e6454f5f1d157cd760ab01bc6f19a4aba29bb65/app/src/main/assets/GM.js#L124-L132 and https://github.com/JingMatrix/ChromeXt/blob/7633fe3baa1fcfb6287e002e2c7e6bfb3dd21032/app/src/main/assets/GM.js#L734-L737

WhiteSevs commented 1 year ago

The function you asked is the same as tracking. It might be that the (WebView) browser has it own setting against various tracking. Changing this setting might incure blames from users, so I prefer not to change it. Hence, I suggest you to store the cookie in your own scripts only and send them by setting the cookie parameter. ChromeXt provides two helper funtions for you to parse headers and export cookies:

https://github.com/JingMatrix/ChromeXt/blob/9e6454f5f1d157cd760ab01bc6f19a4aba29bb65/app/src/main/assets/GM.js#L124-L132

and https://github.com/JingMatrix/ChromeXt/blob/7633fe3baa1fcfb6287e002e2c7e6bfb3dd21032/app/src/main/assets/GM.js#L734-L737

这样的话似乎可以GM_cookie.list然后GM_cookie.export来获取Cookie放到details里，但是不能获取其它域名的Cookie，或者在GM.ChromeXt里加个API用于获取其它域名的Cookie?

WhiteSevs commented 1 year ago

我在图床登录，然后使用GM_cookie.export导出时发现，它的有个cookie是.z4a.net，多了一个.，GM_cookie.export后就缺失了这个值

JingMatrix commented 1 year ago

To obtain cookies from other domain, you need to parse the response headers of an xhr request in your onload function. See for example: https://github.com/JingMatrix/ChromeXt/blob/41d61a84467735aa651c54d8a21043a5317446c8/app/src/main/assets/GM.js#L810-L820 , where you call the static method ResponseSink.parseCookie. The above example stores cookies in current domain for Chromium based browsers. You can then export it using GM_info.export(url, cookies).

I searched some information about the leading . of domain and realized that it should be included.Now it is fixed (by a git force push).

JingMatrix commented 1 year ago

I am working on issue #115 to make GM_xhr stateful. Your goal should be reached when the implement is done. It should behave as TamperMonkey.

JingMatrix commented 1 year ago

The re-implement of GM_cookie is done. Now, it should work in the same way as TamperMonkey.

JingMatrix commented 1 year ago

@WhiteSevs Did you succeed to run your script with the latest commits of ChromeXt? I am planning to release a new version of ChromeXt this weekend. If you find no errors, please drop a message here. Also, this issue will be closed after that.

WhiteSevs commented 1 year ago

@WhiteSevs Did you succeed to run your script with the latest commits of ChromeXt? I am planning to release a new version of ChromeXt this weekend. If you find no errors, please drop a message here. Also, this issue will be closed after that.

ok，现在基本上除了那些需要跨域登录的操作都没问题了

JingMatrix / ChromeXt