Cp0204 / quark-auto-save

夸克网盘签到、自动转存、命名整理、发推送提醒和刷新媒体库一条龙
GNU Affero General Public License v3.0
445 stars 69 forks source link

[bug fix] add html character unescape for robustness #19

Closed GQH123 closed 3 months ago

GQH123 commented 3 months ago

when saving from this link, I find get_fids returns no data

{'status': 200, 'code': 0, 'message': '', 'timestamp': 1717104072, 'data': []}

which means the server can not correctly parse the requested path /XXX/1249/【230116】YENA (崔叡娜)、BE′O (비오) - Love War (Feat. BE'O) [MP3][无损]

after referring to previous issues and debugging for a while, I figure out the problem is caused by the html character ' contained in this path, and can be fixed by simply adding manual conversions, as indicated by modifications

you can briefly review this PR to decide if this can be merged onto the main branch without compromising other related logics

Cp0204 commented 3 months ago

I can't reproduce this, in my test:

/【230116】YENA (崔叡娜)、BE′O (비오) - Love War (Feat. BE'O) [MP3][无损]

and

/【230116】YENA (崔叡娜)、BE′O (비오) - Love War (Feat. BE'O) [MP3][无损]

are parsed correctly.

In the quark server's storage path, &#39 and ' are not the same character and are allowed to exist at the same time, except that the front-end returns the file_name field both as &#39.

image

Can you provide me with your complete task item for testing?


curl --request POST \
  --url 'https://drive.quark.cn/1/clouddrive/file/info/path_list?fr=pc' \
  --header 'content-type: application/json' \
  --header 'cookie: XXX' \
  --data '{
    "file_path": [
        "/【230116】YENA (崔叡娜)、BE′O (비오) - Love War (Feat. BE'O) [MP3][无损]",
        "/【230116】YENA (崔叡娜)、BE′O (비오) - Love War (Feat. BE'\''O) [MP3][无损]"
    ],
    "namespace": "0"
}'

return:

{
  "status": 200,
  "code": 0,
  "message": "",
  "timestamp": 1717125575,
  "data": [
    {
      "fid": "be54ff40fb0a44db841b59651a42ea72",
      "file_name": "【230116】YENA (崔叡娜)、BE′O (비오) - Love War (Feat. BE'O) [MP3][无损]",
      "file_path": "/【230116】YENA (崔叡娜)、BE′O (비오) - Love War (Feat. BE'O) [MP3][无损]",
      ...
    },
    {
      "fid": "2d59059510764eaab322de6c8583339c",
      "file_name": "【230116】YENA (崔叡娜)、BE′O (비오) - Love War (Feat. BE'O) [MP3][无损]",
      "file_path": "/【230116】YENA (崔叡娜)、BE′O (비오) - Love War (Feat. BE'O) [MP3][无损]",
       ...
    }
  ]
}
Cp0204 commented 3 months ago

I see the problem.

The share is actually a directory with the character ', but file_name returns ' and the program tries to read the fid of the directory with ' and therefore fails to read it.

This problem is actually caused by the quark server's sloppy return, unescape ' with html doesn't solve the root problem, and will bring a new problem: it can't read directories with ' characters.

So we won't merge the PRs for now, and if similar problems occur more often, we'll think about whether there are other solutions.

GQH123 commented 3 months ago

OK I see. It seems like Quark will keep html characters such as ' in folder names as their real forms in some cases. If other problems occur, I will contact you soon.