Victrid / freshrss-image-cache-plugin

Cache feed images on your own facility or Cloudflare cache.
GNU General Public License v3.0
34 stars 6 forks source link

Some images cannot be cached by 【Sputniknews. cn】, Russian satellite news agency #6

Open Whichbfj28 opened 6 months ago

Whichbfj28 commented 6 months ago

https://cdn.sputniknews.cn/img/07e7/03/09/1048554996_0:240:1280:960_1920x0_80_0_0_1ef90b9d0835157789ba71fd099d385c.jpg.webp 【no】 https://cdn.sputniknews.cn/img/102466/86/1024668632_0:143:960:683_1920x0_80_0_0_f949eadf9a0e7b6f3d9dd095e4832d74.jpg.webp 【yes】

Victrid commented 6 months ago

Can't replicate, it works on my setting:

image

~ % curl --request GET \
  --url 'https://[piccache].workers.dev/piccache?url=https%3A%2F%2Fcdn.sputniknews.cn%2Fimg%2F07e7%2F03%2F09%2F1048554996_0%3A240%3A1280%3A960_1920x0_80_0_0_1ef90b9d0835157789ba71fd099d385c.jpg.webp' -vvv --output 1.webp
....
< HTTP/2 200
...
< content-length: 236510
< cf-ray: ....-HKG
< cf-cache-status: HIT
...
{ [5 bytes data]
100  230k  100  230k    0     0   843k      0 --:--:-- --:--:-- --:--:--  842k
* Connection #0 to host [piccache].workers.dev left intact

You need to provide more info, like response header, or curl verbosed output.

Whichbfj28 commented 6 months ago
curl --request GET \
>   --url 'https://freshrss.freshrss.com/i/hc.php?url=https%3A%2F%2Fcdn.sputniknews.cn%2Fimg%2F102817%2F69%2F1028176914_0%3A14%3A1100%3A633_1920x0_80_0_0_3758150fc8c0ae7e75642b3cbdedbb7b.jpg.webp' -vvv --output 1.webp
Note: Unnecessary use of -X or --request, GET is already inferred.
* Expire in 0 ms for 6 (transfer 0x557638d2e010)
* Expire in 1 ms for 1 (transfer 0x557638d2e010)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Expire in 1 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 1 ms for 1 (transfer 0x557638d2e010)
* Expire in 1 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 4 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 4 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 4 ms for 1 (transfer 0x557638d2e010)
* Expire in 3 ms for 1 (transfer 0x557638d2e010)
* Expire in 3 ms for 1 (transfer 0x557638d2e010)
* Expire in 4 ms for 1 (transfer 0x557638d2e010)
* Expire in 3 ms for 1 (transfer 0x557638d2e010)
* Expire in 3 ms for 1 (transfer 0x557638d2e010)
* Expire in 4 ms for 1 (transfer 0x557638d2e010)
*   Trying IP...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x557638d2e010)
* Connected to freshrss.freshrss.com (1.1.1.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [19 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2393 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [79 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=freshrss.freshrss.com
*  start date: Apr 19 15:52:20 2024 GMT
*  expire date: Jul 18 15:52:19 2024 GMT
*  subjectAltName: host "freshrss.freshrss.com" matched cert's "freshrss.freshrss.com"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x557638d2e010)
} [5 bytes data]
> GET /i/hc.php?url=https%3A%2F%2Fcdn.sputniknews.cn%2Fimg%2F102817%2F69%2F1028176914_0%3A14%3A1100%3A633_1920x0_80_0_0_3758150fc8c0ae7e75642b3cbdedbb7b.jpg.webp HTTP/2
> Host: freshrss.freshrss.com
> User-Agent: curl/7.64.0
> Accept: */*
> 
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [265 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [265 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
} [5 bytes data]
< HTTP/2 200 
< server: nginx
< date: Thu, 16 May 2024 05:43:36 GMT
< content-type: application/x-empty; charset=binary
< content-length: 0
< x-piccache-status: HIT
< 
{ [0 bytes data]
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Connection #0 to host freshrss.freshrss.com left intact

hc.php=piccache.php

image

The output file size is 0 We are using a self built piccache method. Non CF workers. dev

Victrid commented 6 months ago
< x-piccache-status: HIT

The piccache serving seems to be working fine. When the server is downloading this particular image, the connection is broken, causing an empty file to be created with no content written to it.

Then when you accessing it, the caching server checked that the cache exists and pass you the wrongly downloaded empty file.

I think this error occurrence should be very rare. If you insist on fixing the specific image, deleting this file would work:

[CACHE_PLACE_PATH]/piccache/4e72203c1e7acc0245219d3d2a2b9d9615495ed5cb2f84ac619449e52fdcbdd4

or simply removing the entire folder [CACHE_PLACE_PATH]/piccache and run curl again. You should see the correct output, but header x-piccache-status is MISS.

If you are seeing lots of empty images, or still outputing empty files, please let me know.

Whichbfj28 commented 6 months ago

image image

freshrss :docker image: freshrss/freshrss:1.23.1 extensions:freshrss-image-cache-plugin-0.4【Cloudflare was not used. We used a self built Piccache method(Place piccache in the sub path of freshress)】

  1. There are still many files with a size of 0

  | 2024-05-18 08:30:01 | Feed already being actualized: https://rsshub.*.org/jiemian/list/71
  | 2024-05-18 08:30:01 | Feed already being actualized: https://rsshub.*.org/jiemian/list/2
  | 2024-05-18 07:30:01 | Feed already being actualized: https://rsshub.*.org/jiemian/list/2
  | 2024-05-18 07:01:01 | Feed already being actualized: https://rsshub.*.org/jiemian/list/71
  | 2024-05-18 07:01:01 | Feed already being actualized: https://rsshub.*.org/jiemian/list/2
  | 2024-05-18 06:01:01 | Feed already being actualized: https://rsshub.*.org/jiemian/list/2

freshrss log

2. Encountered another serious problem. Using version 0.4 to enable active caching seems to cause feed updates to freeze.I am replacing 0.4 with the old version 0.3. After replacing with version 0.3. There is no such issue

Victrid commented 6 months ago

This seems strange. Can you attach your piccache.php file?

if you flush the cache folder, and change get($url) to:

function get($url)
{
   if ( file_exists(get_name($url)) ) {
      $file = get_name($url);
      return filesize($file) != 0 ? $file : null;
   } else {
      return null;
   }
}

can you see the picture?

Whichbfj28 commented 6 months ago
<?php
define("CACHE_PLACE_PATH", "../../data/");
# Also possible:
# define("CACHE_PLACE_PATH", "C:\\your\\Directory");
# define("CACHE_PLACE_PATH", "/var/www/html/directory");
# Remember to set correct privileges allowing PHP access.
function join_paths(...$paths) {
    return preg_replace('~[/\\\\]+~', DIRECTORY_SEPARATOR, implode(DIRECTORY_SEPARATOR, $paths));
};

function get_name($url) {
    $tmp_path = join_paths(CACHE_PLACE_PATH, "piccache");
    if (!file_exists($tmp_path)) mkdir(join_paths($tmp_path), 0777);
    return join_paths($tmp_path, hash('sha256', $url));
}

function get($url) { return file_exists(get_name($url)) ? get_name($url) : null; }

function set($url) {
    $file_name = get_name($url);
    $content = file_get_contents($url);
    file_put_contents($file_name, $content);
    return $file_name;
}

if ($_SERVER['REQUEST_METHOD'] === 'POST') {
    $post = json_decode(file_get_contents('php://input'), true);
    if (! $post || ! array_key_exists("url", $post)) {
        http_response_code(400); exit();
    }
    set($post['url']);
    header('Content-Type: application/json; charset=utf-8');
    echo '{"status": "OK"}' . PHP_EOL;
    exit();
} elseif ($_SERVER['REQUEST_METHOD'] === 'GET') {
    $url = $_GET['url'];
    if (!$url){ http_response_code(400); exit(); }
    $file = get($url);
    header("X-Piccache-Status: ". ($file ? "HIT" : "MISS"));
    if (! $file) $file = set($url);
    $finfo = finfo_open(FILEINFO_MIME);
    header('Content-Type: ' . finfo_file($finfo, $file));
    finfo_close($finfo);
    header('Content-Length: ' . filesize($file));
    $fp = fopen($file, 'rb');
    fpassthru($fp);
    exit();
} else {
    http_response_code(405);
    exit();
}
?>
Whichbfj28 commented 6 months ago

1、Only the line "define (" CACHE-PLACEPATH ","../../data/")" has been modified. The rest remain unchanged 2、maybe it be because the image mentioned in the title cannot be converted. Resulting timeout?