Failure to download from some websites where content-type encoding is not as expected by library - code fix included

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

Download data from a website when the content-type header is complex.

This header would work in the library

    Content-Type: application/rss+xml; charset=utf-8

This header won't work

    Content-Type: application/rss+xml; charset=utf-8; filename=rssfeed.xml

What is the expected output? What do you see instead?

    You expect the encoding to be detected as UTF-8 but it's detected as utf-8filenamerssfeed.xml

What version of the product are you using? On what operating system?

    V0.24.3

Please provide any additional information below.

    Changing the parseCharset method to this solves the problem, it looks for a semicolon after the charset and limits the read up until that point if it exists.

    private String parseCharset(String tag) {
        if (tag == null)
            return null;
        int i = tag.indexOf("charset");
        if (i == -1)
            return null;

        int e = tag.indexOf(";", i) ;
        if (e == -1) e = tag.length();

        String charset = tag.substring(i + 7, e).replaceAll("[^\\w-]", "");
        return charset;
    }

Original issue reported on code.google.com by roxbur...@gmail.com on 13 Dec 2012 at 11:14

GoogleCodeExporter commented 9 years ago

Original comment by tinyeeliu@gmail.com on 14 Dec 2012 at 4:44

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Fixed in dev. Will be available next release.

Original comment by tinyeeliu@gmail.com on 17 Dec 2012 at 6:12

Changed state: Fixed

Qnatz / android-query

Failure to download from some websites where content-type encoding is not as expected by library - code fix included #107