DavidAnson / check-pages-cli

Checks various aspects of a web page for correctness.
MIT License
2 stars 0 forks source link

bug(non-ascii): encoded Cyrillic domains #1

Open Kristinita opened 2 years ago

Kristinita commented 2 years ago

1. Summary

Both check-pages-cli and grunt-check-pages show errors if Cyrillic domains in values of the a.href attribute. Like this:

<a href="https://%D0%B6%D1%83%D1%80%D0%BD%D0%B0%D0%BB%D1%8C%D0%BD%D1%8B%D0%B9%D0%BC%D0%B8%D1%80.%D1%80%D1%84/avtor/savchenko-boris">Kira Goddess!</a>

2. MCVE

3. Steps to reproduce

  1. I run http-server:

    D:\SashaDebugging\KiraCheckPages>http-server
    Starting up http-server, serving ./
    
    http-server version: 14.1.1
    
    http-server settings:
    CORS: disabled
    Cache: 3600 seconds
    Connection Timeout: 120 seconds
    Directory Listings: visible
    AutoIndex: visible
    Serve GZIP Files: false
    Serve Brotli Files: false
    Default File Extension: none
    
    Available on:
      http://192.168.1.2:8080
      http://127.0.0.1:8080
      http://127.0.2.2:8080
      http://127.0.2.3:8080
    Hit CTRL-C to stop the server
  2. In new tab I run check-pages-cli:

    check-pages http://127.0.0.1:8080/KiraEncodedCyrillicDomain.html --checkLinks

4. Behavior

4.1. Desired

No errors.

If Latin domains, I get no errors.

4.2. Current

D:\SashaDebugging\KiraCheckPages>check-pages http://127.0.0.1:8080/KiraEncodedCyrillicDomain.html --checkLinks
Page: http://127.0.0.1:8080/KiraEncodedCyrillicDomain.html (33ms)
node:events:491
      throw er; // Unhandled 'error' event
      ^

Error: Invalid URI "https:///%d0%b6%d1%83%d1%80%d0%bd%d0%b0%d0%bb%d1%8c%d0%bd%d1%8b%d0%b9%d0%bc%d0%b8%d1%80.%d1%80%d1%84/avtor/savchenko-boris"
    at Request.init (C:\Users\SashaChernykh\AppData\Roaming\npm\node_modules\check-pages-cli\node_modules\request\request.js:273:31)
    at new Request (C:\Users\SashaChernykh\AppData\Roaming\npm\node_modules\check-pages-cli\node_modules\request\request.js:127:8)
    at request (C:\Users\SashaChernykh\AppData\Roaming\npm\node_modules\check-pages-cli\node_modules\request\index.js:53:10)
    at C:\Users\SashaChernykh\AppData\Roaming\npm\node_modules\check-pages-cli\node_modules\request\index.js:100:12
    at C:\Users\SashaChernykh\AppData\Roaming\npm\node_modules\check-pages-cli\node_modules\check-pages\checkPages.js:143:33
    at next (C:\Users\SashaChernykh\AppData\Roaming\npm\node_modules\check-pages-cli\node_modules\check-pages\checkPages.js:435:5)
    at Request._callback (C:\Users\SashaChernykh\AppData\Roaming\npm\node_modules\check-pages-cli\node_modules\check-pages\checkPages.js:314:9)
    at self.callback (C:\Users\SashaChernykh\AppData\Roaming\npm\node_modules\check-pages-cli\node_modules\request\request.js:185:22)
    at Request.emit (node:events:513:28)
    at Request.<anonymous> (C:\Users\SashaChernykh\AppData\Roaming\npm\node_modules\check-pages-cli\node_modules\request\request.js:1154:10)
Emitted 'error' event on Request instance at:
    at Request.init (C:\Users\SashaChernykh\AppData\Roaming\npm\node_modules\check-pages-cli\node_modules\request\request.js:273:17)
    at new Request (C:\Users\SashaChernykh\AppData\Roaming\npm\node_modules\check-pages-cli\node_modules\request\request.js:127:8)
    [... lines matching original stack trace ...]
    at Request.<anonymous> (C:\Users\SashaChernykh\AppData\Roaming\npm\node_modules\check-pages-cli\node_modules\request\request.js:1154:10)

Node.js v18.9.0

5. Environment

  1. Microsoft Windows [Version 10.0.19041.1415]
  2. Node.js v18.9.0
  3. check-pages-cli 0.10.0
  4. http-server v14.1.1

Thanks.

DavidAnson commented 2 years ago

Looks like it is the request package that is throwing. I haven't touched this project in a long time, so it is probably getting old. Would be interesting to see if the native URL class handles this better?

Kristinita commented 9 months ago

Type: Question :question:

Looks like it is the request package that is throwing.

Unfortunately, as of Feb 11th 2020, “request” is fully deprecated.

What about migration from “request” to “got” for check-pages? The developers of “got” writes: “You may think it’s too hard to switch, but it’s really not.”. At the time of writing this comment, “got” is actively developed and supports HTTP/2 requests.

Thanks.