Dobby233Liu / garfield.csv

Python script that converts comic transcripts from john.ccac.rwth-aachen.de (like Garfield.txt) into CSV files
MIT License
4 stars 0 forks source link

Find the real character encoding for transcripts #1

Open Dobby233Liu opened 3 years ago

Dobby233Liu commented 3 years ago

http://john.ccac.rwth-aachen.de:8000 says a lot. Let's see what headers it returns.

Dobby233Liu commented 3 years ago

I did many expermients all pushed on pages branch

Dobby233Liu commented 3 years ago
Date: Sun, 02 May 2021 04:01:28 GMT
Server: Apache/2.4.38 (Debian)
Last-Modified: Sat, 10 Feb 2007 11:03:42 GMT
ETag: "831-4291d3aac3f80-gzip"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 1133
Content-Type: text/html

generated html is utf8

Dobby233Liu commented 3 years ago

cp 1252

Dobby233Liu commented 3 years ago

chardet: iso-8859-1 php: iso-8859-1/shift-jis file: iso-8859-1/ascii dfeal: cp1252