XMLTV / xmltv

Utilities to obtain, generate, and post-process TV listings data in XMLTV format
GNU General Public License v2.0
300 stars 94 forks source link

htsmsg_xml_deserialize error Unknown syntatic element #210

Closed halfagascan closed 1 year ago

halfagascan commented 1 year ago

Thanks for taking the time to report an issue. Please take a moment to review our open/closed issues above, in case your issue has already been reported.

If you are reporting a new issue, please give your issue a descriptive title and fill out the blanks below, providing as much information as possible.

Docker

XMLTV Version?

xmltv-1.2.1-r3 installed

(Please specify release version or git commit ID) …

XMLTV Component?

(Grabber name or utility)

usr/bin/tv_grab_file -v 0.1 Perl Version

This is perl 5, version 36, subversion 1 (v5.36.1) built for x86_64-linux-thread-multi

Operating System

NAME="Alpine Linux" ID=alpine VERSION_ID=3.18.0 PRETTY_NAME="Alpine Linux v3.18"

What happened? From within tvheadend webui, configuration/Channel/EPG/EPG Grabber rerun internal grabbers results in the error in title. What did you expect to happen? /usr/bin/tv_grab_file: parse took 0 seconds

2023-06-05 16:51:11.965 xmltv: /usr/bin/tv_grab_file: channels tot= 233 new= 0 mod= 0

2023-06-05 16:51:11.965 xmltv: /usr/bin/tv_grab_file: brands tot= 0 new= 0 mod= 0

2023-06-05 16:51:11.965 xmltv: /usr/bin/tv_grab_file: seasons tot= 0 new= 0 mod= 0

2023-06-05 16:51:11.965 xmltv: /usr/bin/tv_grab_file: episodes tot= 0 new= 0 mod= 0

2023-06-05 16:51:11.965 xmltv: /usr/bin/tv_grab_file: broadcasts tot= 4810 new= 144 mod= 427

Did you see any warnings/errors? as in title

(Please paste any warnings/errors, if available) …

What steps are needed to reproduce this issue? as detailed above (Please provide the full commands you are running) 1.webui: configure/Channel/EPG/EPG Grabber/ re-run internal EPG Grabbers errors as in title, first run only, therafter its fine 2.docker exec -it tvheadend bash /usr/bin/tv_grab_file executes, no issues

3.sudo cat /var/lib/docker/containers/58528c696ca6b77e9aef5888f98dee2b04732d92ed65be692d102173c741db4a/58528c696ca6b77e9aef5888f98dee2b04732d92ed65be692d102173c741db4a-json.log | grep htsmsg_xml {"log":"2023-06-05 15:42:18.436 [ ERROR] xmltv: /usr/bin/tv_grab_file: htsmsg_xml_deserialize error Unknown syntatic element: \u003c!DOCTYPE tv\n","stream":"stderr","time":"2023-06-05T21:42:18.437058371Z"} {"log":"2023-06-05 15:51:55.536 [ ERROR] xmltv: /usr/bin/tv_grab_file: htsmsg_xml_deserialize error Unexpected end of file during parsing of label reference\n","stream":"stderr","time":"2023-06-05T21:51:55.536258147Z"} {"log":"2023-06-05 15:59:46.136 [ ERROR] xmltv: /usr/bin/tv_grab_file: htsmsg_xml_deserialize error Unexpected end of file during parsing of label reference\n","stream":"stderr","time":"2023-06-05T21:59:46.136670491Z"}

head -3 provider.xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE tv SYSTEM "xmltv.dtd">

' I think the error is here: 'Unknown syntatic element: \u003c!DOCTYPE' Please attach your config file below: ------------------------------------- (Remember to remove any usernames/passwords) … Any other information? ---------------------- (For example, is this a new or intermittent issue?) …
rmeden commented 1 year ago

tv_grab_file ????? I don't think there is such a thing in the XMLTV project distribution. Maybe you got it from somewhere else? What does the man page or file contents say?

honir commented 1 year ago

This is, of course, an issue with tvheadend and not xmltv project, but I will try to help you.

Where in your xml file is the DOCTYPE line?

It should be line 2. So I would have expected your file to say something like:

head -4 provider.xml
     <?xml version="1.0" encoding="UTF-8"?>
     <!DOCTYPE tv SYSTEM "xmltv.dtd">

     <tv source-info-url="http://www.schedulesdirect.org" source-info-name="Schedules Direct" generator-info-name="tv_grab_sd_json">

Make sure it is either line 2, or remove it (the xmlparser in TVH appears simply to remove it if it's found) grep -n DOCTYPE provider.xml

(and make sure you don't have more than one! - e.g. this can happen when providers concatenate files incorrectly)

halfagascan commented 1 year ago

honir, thanks for the pointers, very helpful. I'll try the code block, again, didn't show all, the first time. head -3 provider.xml <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE tv SYSTEM "xmltv.dtd"><tv source-info-name="m3u-epg-editor" source-info-url="github.com/bebo-dot-dev/m3u-epg-editor" source-data-url="github.com/bebo-dot-dev/m3u-epg-editor" generator-info-name="m3u-epg-editor" generator-info-url="https://github.com/bebo-dot-dev/m3u-epg-editor"> and grep -n DOCTYPE provider.xml 2:<!DOCTYPE tv SYSTEM "xmltv.dtd"> I did request assistance on tvheadend forum, no response, found a post regarding this very error, marked as solved https://tvheadend.org/issues/3283 , from 7 years ago. Thanks

honir commented 1 year ago

From where did you get your /usr/bin/tv_grab_file?

Are you sure TVH is reading the right file?

(Original version of tv_grab_file used a fixed filename - not provider.xml)

garybuhrmaster commented 1 year ago

It appears the OP recently (i.e. in the last few days/hours) worked with the author of that "grabber" to change a couple of things including the header.

I recommend the OP goes back and continues to work with the author of that "grabber" to continue the discussion and any re-work needed for that application.

bebo-dot-dev commented 1 year ago

@garybuhrmaster presumably you're referring to https://github.com/bebo-dot-dev/m3u-epg-editor/issues/77

I'm the maintainer of that project and I've come to the same conclusion that @honir has. This is a tvheadend issue, it is neither an XMLTV project or an m3u-epg-editor project issue.

The EPG XML data that @halfagascan has validates as expected against the XMLTV DTD, whichever version of tvheadend it is that he's running, it's the htsmsg_xml_deserialize tvheadend function that fails to process it, it fails with htsmsg_xml_deserialize errors similar to those reported above in comment 1 in this issue.

https://tvheadend.org/projects/tvheadend/issues is where this issue needs to be taken IMO

honir commented 1 year ago

I had looked at the source code for htsmsg_xml_deserialize which suggests the error is in the content of the source file. htsmsg_xml_deserialize will ignore a DOCTYPE entry in the file header but break if one occurs later in the file.

@halfagascan I suggest you hand-craft a MWE file with just one channel and one programme, and feed that into TVH to see what it makes of it.

There are lots and lots of errors with IPTV EPGs if that is your data source - the providers aren't bothered about correctness.

halfagascan commented 1 year ago

honir, I appreciate the follow-up. I'll close. Thanks