The charlock_holmes API seems to be string centric, but if have a 50mb file which mostly consists of typical alphabetic/ASCII characters but only has a few non-ASCII characters to distinguish the encodings, what's the best way detect the entire file's encoding without loading the 50mb file (or larger) into memory?
The charlock_holmes API seems to be string centric, but if have a 50mb file which mostly consists of typical alphabetic/ASCII characters but only has a few non-ASCII characters to distinguish the encodings, what's the best way detect the entire file's encoding without loading the 50mb file (or larger) into memory?