ben-xo / dir2cast

Turn a directory of MP3s into a podcast - automatically.
http://www.ben-xo.com/dir2cast
BSD 3-Clause "New" or "Revised" License
152 stars 20 forks source link

Invalid Char value 31 #35

Closed MattiasRensmo closed 4 years ago

MattiasRensmo commented 4 years ago

When i run the PHP I get the following error:

This page contains the following errors:
error on line 3310 at column 52: PCDATA invalid Char value 31
Below is a rendering of the page up to the first error.

I've done some googling and found the following from here:

I find that most characters whose char value is less than 32(decimal) is a Control Character and should not be put in the atom.xml, see this and this. The most useful info from here is:

When you put utf-8 encoded strings in a XML document you should remember that not all utf-8 valid > chars are accepted in a XML document http://www.w3.org/TR/REC-xml/#charsets So you should strip away the unwanted chars, else you’ll have an XML fatal parsing error

I have looked but i guess i don't have the skill to find the invalid character in the file.

ben-xo commented 4 years ago

@Mrensmo Character 31 is "Unit Separator" - see http://www.asciitable.com/ - it's quite unusual to see this character. Perhaps it's added by a piece of software you use.

It could be coming from almost anywhere - in your MP3 tags, or the summary text, or…

I will look into sanitizing the output for these characters when they're rendered.

MattiasRensmo commented 4 years ago

I manage to remove it from the xml-file by "Find and Replace" (i search for it's hex number with RegExp). Only to get a new error with another char:

This page contains the following errors:
error on line 3241 at column 52: PCDATA invalid Char value 30
Below is a rendering of the page up to the first error.

I understand that it is my files that are strange but it would be lovely if your program could handle it. =)

It sems to be in these two fields that the strange chars apear.

<itunes:summary>Produced by</itunes:summary>
<description><![CDATA[Produced by]]></description>
ben-xo commented 4 years ago

@Mrensmo could you try the version of dir2cast in the branch https://github.com/ben-xo/dir2cast/tree/feature/issue-35-fix-invalid-utf8-characters-in-xml ?

MattiasRensmo commented 4 years ago

Works like a charm!

Thnks, very nice =)