denoland / deno

A modern runtime for JavaScript and TypeScript.
https://deno.com
MIT License
97.44k stars 5.37k forks source link

Main thread panic with fetch() #7208

Closed exside closed 3 years ago

exside commented 4 years ago

I'm running into some strange issues when fetching image files. Pulled my hair out trying to catch the problematic portion...turns out the problematic portion is simply fetch() itself =), and there is no catching or working around the issue, it simply crashes each time it encounters whatever the problem is. I reduced what I'm doing to a (hopefully) reproducible snippet:

Deno.env.set('RUST_BACKTRACE', 'full'); // needs --allow-env

// content-disposition: inline; filename="2020-04-27_15h06_25.png"; filename*=UTF-8''2020-04-27_15h06_25.png
const working = 'https://community.upc.ch/t5/image/serverpage/image-id/7683i8DD8EAF256FF500C';
// content-disposition: inline; filename="Ursprnge-und-Rezepte-Blog-Banner.png"; filename*=UTF-8''Urspr%C3%BCnge-und-Rezepte-Blog-Banner.png
const panic1 = 'https://community.upc.ch/t5/image/serverpage/image-id/7679iD6978D01CC52670D';
// content-disposition: inline; filename="virtuelle-Untersttzung_blog.jpg"; filename*=UTF-8''virtuelle-Unterst%C3%BCtzung_blog.jpg
const panic2 = 'https://community.upc.ch/t5/image/serverpage/image-id/7611i6132FD57C7312C6D';
// content-disposition: inline; filename="virtuelle-Untersttzung_blog.jpg"; filename*=UTF-8''virtuelle-Unterst%C3%BCtzung_blog.jpg
const panic3 = 'https://community.upc.ch/t5/image/serverpage/image-id/7610i2A587382D4015838';
// content-disposition: inline; filename="Ursprnge-und-Rezepte-Blog-Banner.png"; filename*=UTF-8''Urspr%C3%BCnge-und-Rezepte-Blog-Banner.png
const panic4 = 'https://community.upc.ch/t5/image/serverpage/image-id/7680i0E08F8FBC53FFB86';
// content-disposition: inline; filename="Lg EPG - Update - UPC Community.jpg"; filename*=UTF-8''L%C3%B6sung%20EPG%20-%20Update%20-%20UPC%20Community.jpg
const panic5 = 'https://community.upc.ch/t5/image/serverpage/image-id/6214i47B0827B20634C66';
// content-disposition: inline; filename="Programmbersicht vom 19.1.2019 - TV-Programm von UPC.jpg"; filename*=UTF-8''Programm%C3%BCbersicht%20vom%2019.1.2019%20-%20TV-Programm%20von%20UPC.jpg
const panic6 = 'https://community.upc.ch/t5/image/serverpage/image-id/5030i891681AB69CE5C39';
// content-disposition: inline; filename="Programmbersicht vom 19.1.2019 - TV-Programm von SRF.jpg"; filename*=UTF-8''Programm%C3%BCbersicht%20vom%2019.1.2019%20-%20TV-Programm%20von%20SRF.jpg
const panic7 = 'https://community.upc.ch/t5/image/serverpage/image-id/5030i891681AB69CE5C39';
// content-disposition: inline; filename="Ststes Mobilnetz   UPC.jpg"; filename*=UTF-8''St%C3%A4rkstes%20Mobilnetz%20%20%20UPC.jpg
const panic8 = 'https://community.upc.ch/t5/image/serverpage/image-id/4954i1B3CE88242518135';
// content-disposition: inline; filename="l.JPG"; filename*=UTF-8''l%C3%B6ser.JPG
const panic9 = 'https://community.upc.ch/t5/image/serverpage/image-id/3571i5F26CFABF7C4A92D';
// content-disposition: inline; filename="leere Beitr.JPG"; filename*=UTF-8''leere%20Beitr%C3%A4ge.JPG
const panic10 = 'https://community.upc.ch/t5/image/serverpage/image-id/3449i22A920E6B52DE26C';
// content-disposition: inline; filename="Krmel.JPG"; filename*=UTF-8''Kr%C3%BCmel.JPG
const panic11 = 'https://community.upc.ch/t5/image/serverpage/image-id/3321i50B36E3AE3CA5BEC';
// content-disposition: inline; filename="ein kleiner Lacher fr zwischendurch ;).jpg"; filename*=UTF-8''ein%20kleiner%20Lacher%20f%C3%BCr%20zwischendurch%20%3B%29.jpg
const panic12 = 'https://community.upc.ch/t5/image/serverpage/image-id/3128iB4861C5D5DB8626F';

// let res = await fetch(working);
let res = await fetch(panic1);
// let res = await fetch(panic2, {
//  method: 'HEAD' // doesn't change anything, e.g. definitely a header issue IMHO
//});
res = await res.blob();

console.log(res);

the error I'm getting is:

thread 'main' panicked at 'called Result::unwrap() on an Err value: ToStrError { _priv: () }', cli/ops/fetch.rs:96:42

I thought at first it might be some strange thing with one specific image/file, but then I skipped that one and soon thereafter ran into the same crash again, so it isn't only one specific file, I think it is more something within the headers of the response that fails not with the actual file-data, because the rust stack trace points to this:

    for (key, val) in res.headers().iter() {
      // no idea why there is .to_string() for key and .to_str() for value =)...is there a difference?
      // forgive the comment, not familiar with the rust side of deno yet
      res_headers.push((key.to_string(), val.to_str().unwrap().to_owned()));
    }

from here https://github.com/denoland/deno/blob/6d964fc607f78b1652a4e04e8a7cf3672fac9d61/cli/ops/fetch.rs#L97

I'm assuming it is specifically this part: val.to_str().unwrap().to_owned() as it calls .unwrap() which is mentioned in the stack trace...

I had a closer look at the headers (via browser inspector > network tab) and the only potentially problematic thing I'm seeing for ALL "panic" files is the content-disposition header that they contain the actual filename the image was uploaded as and when they have (seemingly?) encoded German umlauts in their names, things break. As far as I can take from MDN:

The parameters filename and filename differ only in that filename uses the encoding defined in RFC 5987. When both filename and filename are present in a single header field value, filename is preferred over filename when both are understood.

so technically that would mean the filename* one is taken, which seems properly encoded? But yeah, just guessing here, the pattern seems quite constant though...

Might be somehow related to those #6649 #6965

PS: Just upgraded to Deno 1.3.1, same issue, but had the same on the version before that, don't remember what it was, but for sure 1.x!

lucacasonato commented 4 years ago

Thanks for the very detailed reproduction.

exside commented 4 years ago

@lucacasonato np, wish I could do more, but don't even know where to start debugging the rust side of Deno =/ (happy to learn if you point me to some useful resources). Until now I ran trough about 1500 images of which 11 caused a panic, all with the same kind of pattern I described above, the content-disposition header always has some encoded characters in it when stuff breaks... so I'd definitely go down that rabbit hole first if I would know how to debug this.

Updated the initial comment with the new data.

This sounds kinda like what seems to happen with .unwrap()

In Rust, when you have an operation that may either return a T or fail, you will have a value of type Result<T,E> or Option (E will be the error condition in case of an interesting error). The function unwrap(self) -> T will give you the embedded T if there is one. If instead there is not a T but an E or None then it will panic. It is best used when you are positively sure that you don't have an error. If that is not the case usually it is better either pattern-match the error or use the ? operator to forward the error.

from https://stackoverflow.com/questions/36362020/what-is-unwrap-in-rust-and-what-is-it-used-for

mustafapc19 commented 4 years ago

From the docs of the reqwest:

/// To handle this, the 'HeaderValue' is useable as a type and can be compared /// with strings and implements 'Debug'. A 'to_str' fn is provided that returns /// an 'Err' if the header value contains non visible ascii characters. #[derive(Clone, Hash)] pub struct HeaderValue { inner: Bytes, is_sensitive: bool, }

Link

I think 'filename' field could be encoded wrong. IMO the we could ignore this problem by matching against the error and ignore the error. At Least the deno wont crash r8? For the real solution we need to split the "content-disposition" around ";" and ignore "filename" field when possible. But it seems a painful task at least for me. If someone confirms my analysis I might help out.

exside commented 4 years ago

@mustafapc19 I wouldn't be surprised at all if the filename is encoded wrong, at least based on what I know about the platform that does the "encoding" =)...but it's a proprietary black box, so nothing I could change on how that is done, therefore I agree it would probably be best to simply ignore the error to_str throws...

aaether14 commented 4 years ago

I might be reproducing the same issue. It's not exactly on the same line but I'm using 1.32.

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ToStrError { _priv: () }', cli/ops/fetch.rs:99:40

Here is the script I ran, along with the backtrace resulting from RUST_BACKTRACE=full ./deno run --allow-net fetch_crash.ts fetch_crash.zip

I'm running deno (https://github.com/denoland/deno/releases/tag/v1.3.2) on Manjaro Linux.

mustafapc19 commented 4 years ago

I think this can be mitigated if convert the "val" there into a byte array and split it on ";" and ignore the filename in case of error. But like I said I dont know if this would be worth the pain. Or we could ignore it the encoding error. But I don't think we should let the server crash. If someone greenlit this ? @lucacasonato

ry commented 4 years ago

A very odd property of HTTP is that header values do not need to be strings - they can be possibly raw binary data.

> curl -i https://community.upc.ch/t5/image/serverpage/image-id/7679iD6978D01CC52670D
HTTP/2 200
content-type: image/png;charset=UTF-8
date: Thu, 03 Sep 2020 20:55:55 GMT
server: Apache
content-disposition: inline; filename="Urspr�nge-und-Rezepte-Blog-Banner.png"; filename*=UTF-8''Urspr%C3%BCnge-und-Rezepte-Blog-Banner.png
cache-control: max-age=900
last-modified: Mon, 27 Apr 2020 10:50:47 GMT
expires: Fri, 03 Sep 2021 20:55:55 GMT
x-cache: Miss from cloudfront
via: 1.1 d4cdd862c8bc0148f37b685614031cf5.cloudfront.net (CloudFront)
x-amz-cf-pop: EWR52-C1
x-amz-cf-id: x_2WReFxKxmOfV3sjmUEQLb2yMWxMHo3I-DG0Cwvi3jziG_ZEdwfLQ==

It appears the "content-disposition" header has some non-utf8 values in it. Valid HTTP but creates a bug in deno.

What does browser fetch() return in this situation?

mustafapc19 commented 4 years ago

@ry I might be wrong but "content-disposition" only makes sense in a browser r8?, as per as I read up on MDN. So I don't really get the reason why deno should be even processing this header. But yeah. Wrong binary encoding will make deno crash on some other header too and can be used for crashing server. So I do think we should ignore and maybe warn,when "to_str()" returns an error, instead of "unwrapping"

kryptish commented 4 years ago

@ry

It appears the "content-disposition" header has some non-utf8 values in it. Valid HTTP but creates a bug in deno.

I think technically that non-utf8 char should be a 'ü' character (check the utf8 encoded second version and decode it), which is definitely part of utf8, the source somehow messes that up... but the same could happen with other sources too or even be used malicously...

Browser fetch() seems to get it right:

HTTP/2 200 OK
content-type: image/png;charset=UTF-8
server: Apache
content-disposition: inline; filename="Ursprünge-und-Rezepte-Blog-Banner.png"; filename*=UTF-8''Urspr%C3%BCnge-und-Rezepte-Blog-Banner.png
last-modified: Mon, 27 Apr 2020 10:50:47 GMT
date: Fri, 04 Sep 2020 23:13:43 GMT
expires: Sat, 04 Sep 2021 23:13:43 GMT
cache-control: max-age=900
x-cache: RefreshHit from cloudfront
via: 1.1 90515c29ffc08c36814da3b1fe9d04e8.cloudfront.net (CloudFront)
x-amz-cf-pop: CDG53-C1
x-amz-cf-id: asPQTJBjJJBd7mMbOpfOgCMyLJgokVfmUKrEn8QN_CnrbcYImlWPdA==
X-Firefox-Spdy: h2
exside commented 4 years ago

I seem to have a talent for causing Deno-panics^^, looking for a workaround until the fetch issue is resolved, I implemented a simple download replacement with curl:

async function curl(url, args = ['-I']) {
    const cmd = Deno.run({
        cmd: ['curl', '-#', ...args, url], 
        stdout: 'piped',
        stderr: 'piped',
    });
    const { code } = await cmd.status();

    let res = null;

    if ( code === 0 ) {
        // parse headers
        if ( args.includes('-I') ) {
            res = new TextDecoder().decode(await cmd.output())
                .trim()
                .split('\r\n')
                .reduce((acc, line, i, arr) => {
                    if ( i === 0 ) {
                        acc['status'] = parseInt(line.split(' ')[1], 10);
                    } else {
                        line = line.split(': ');
                        acc['headers'] = {
                            ...(acc.headers || {}),
                            ...{ [line[0].toLowerCase()]: line[1] },
                        };
                    }
                    return acc;
                }, {});
        }

        // download file
        if ( args.includes('-o') ) {
            // probably never gonna get anything in here, as -o redirects output to file...as far as I understand
            res = new TextDecoder().decode(await cmd.output());
            let dest = args[args.indexOf('-o')+1];
            if ( await exists(dest) ) {
                // console.log('File download with cURL successful', dest);
                res = dest;
            } else {
                console.error('cURL should have downloaded the file, but it doesn\'t seem to exist', url, dest, args, 'code:', code);
                res = null;
            }
        }
    } else {
        console.error('code: ', code, new TextDecoder().decode(await cmd.stderrOutput()));
    }

    cmd.close();

    return res;
}

it doesn't have problems with those strange characters in the header, but it crashes kinda randomly with always the same error:

thread 'main' panicked at 'internal error: entered unreachable code', cli/op_error.rs:264:21
stack backtrace:
   0:        0x106a5d805 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h26e8f1b884bae128
   1:        0x10673f1ac - core::fmt::write::ha0b53fc066268c71
   2:        0x106a5cd69 - std::io::Write::write_fmt::h410b83a4c70c6549
   3:        0x106a5c585 - std::panicking::default_hook::{{closure}}::h70c11ad9e0ccd806
   4:        0x106a5bb6a - std::panicking::rust_panic_with_hook::h3fc8a110bc9d166f
   5:        0x107710a55 - std::panicking::begin_panic::h4f47d9a156a57e50
   6:        0x1066691d9 - <deno::op_error::OpError as core::convert::From<&std::io::error::Error>>::from::h8f3aea24496b63f0
   7:        0x1066690de - <deno::op_error::OpError as core::convert::From<std::io::error::Error>>::from::h3923c3da7515a260
   8:        0x10667cbd4 - deno::ops::process::op_run::hf3d8ca67671b8e50
   9:        0x10669b807 - deno::state::State::core_op::{{closure}}::h4c9536ae798d71a1
  10:        0x106753276 - <extern "C" fn(A0) .> R as rusty_v8::support::CFnFrom<F>>::mapping::c_fn::hd79bb990b65c4085
  11:        0x106c1c77f - _ZN2v88internal25FunctionCallbackArguments4CallENS0_15CallHandlerInfoE
  12:        0x106c1bcfc - _ZN2v88internal12_GLOBAL__N_119HandleApiCallHelperILb0EEENS0_11MaybeHandleINS0_6ObjectEEEPNS0_7IsolateENS0_6HandleINS0_10HeapObjectEEESA_NS8_INS0_20FunctionTemplateInfoEEENS8_IS4_EENS0_16BuiltinArgumentsE
  13:        0x106c1b2a7 - _ZN2v88internalL26Builtin_Impl_HandleApiCallENS0_16BuiltinArgumentsEPNS0_7IsolateE
fatal runtime error: failed to initiate panic, error 5
Abort trap: 6

when I restart the script, it seems to download the file without problems (need to investigate further) until it panics again at some point, so this one is harder to reproduce (at least can't do it reliably until now)...

but it could also be that it is an implementation error on my side with Deno.run(), not sure?!

EDIT: I actually didn't have that part with const { code } = await cmd.status(); and checking for code === 0 initially, once I implemented that I had a full download-run without issues, on the second one now to confirm...so basically not waiting for the status() and just using .output() can cause a panic?

EDIT 2: After trying everything under the sun (including killing the child process with a timeout etc.) it turned out that the *** -# flag of curl for "simple progress" caused the process to hang for some reason unknown to me, but removing it solved the issue...

ghost commented 3 years ago

First off, if anything I say is wrong, don't hesitate to correct me.

...I might be wrong but "content-disposition" only makes sense in a browser r8?, as per as I read up on MDN. So I don't really get the reason why deno should be even processing this header. But yeah. Wrong binary encoding will make deno crash on some other header too and can be used for crashing server. So I do think we should ignore and maybe warn,when "to_str()" returns an error, instead of "unwrapping"

The problem with variable-byte encodings is that you can't just ignore the header since you don't even know where it ends. Whenever I used C++, I internally used UTF-32 to avoid this, but Rust uses UTF-8 by default and has great support for it. With UTF-8, you can't find the index of a character without parsing the entire string up until that character, same applies to random access too, you have to figure out where a character ends in order to proceed.

If any part of it isn't UTF-8, you can't proceed with the iteration, therefore you can't even determine where the header ends.

/// with strings and implements 'Debug'. A 'to_str' fn is provided that returns /// an 'Err' if the header value contains non visible ascii characters.

The weird part about this is that German umlauts aren't ASCII control characters, and are perfectly visible.

Unrelated to the thread, but, since you seemed interested, if you really wanted to learn Rust, you should just check out their site: https://www.rust-lang.org/learn.

Regarding the problem, can the value be read as a byte array, like vec<u8>? Then again, that sounds like recreating the wheel :/

Spoonbender commented 3 years ago

I tried looking up how the code should behave, and it gets a bit complicated.

  1. It seems that by spec, headers "cannot" contain characters outside the ISO-8859-1 character set, so participants shouldn't just UTF-8 encode HTTP headers...
  2. ...but sometimes they do. De-facto, it seems most browsers will try to guesstimate if a header is UTF-8 encoded, and if so - decode as UTF-8.
  3. It seems the Content-Disposition header requires special care: if I understand correctly, it is decoded as ASCII, but its values ("inside" the data) need be parsed according to the encoding they specify (if any)

Main reference: https://www.cuba-platform.com/blog/utf-8-in-http-headers/

So I think the key to solving this issue is to first not assume that the headers are UTF-8, and instead try to go by the spec (using ISO-8859-1), then add handling of UTF-8 headers using guesstimation - in order to improve support for out-of-spec real world scenarios.

bartlomieju commented 3 years ago

We've had a major overhaul of fetch() implementation and enabled a lot of Web Platform Tests which should cover bugs described in this issue. I want to tentatively close this issue, @lucacasonato is that ok with you?

andreubotella commented 3 years ago

11070 should fix this.

lucacasonato commented 3 years ago

Confirmed @11070 fixes this. The original reproduction does not panic anymore.