bebo-dot-dev / m3u-epg-editor

a python m3u / epg optimizer
120 stars 26 forks source link

tvheadend error #77

Closed halfagascan closed 1 year ago

halfagascan commented 1 year ago

For about the last two weeks, I'm getting this error in tvh: Executing "/usr/bin/tv_grab_file"

2023-06-02 07:37:45.981 xmltv: /usr/bin/tv_grab_file: htsmsg_xml_deserialize error Unknown syntatic element:

2023-06-02 07:37:45.981 xmltv: /usr/bin/tv_grab_file: grab returned no data

From docker logs tvheadend: Executing "/usr/bin/tv_grab_file" 2023-06-02 07:40:58.403 [ ERROR] xmltv: /usr/bin/tv_grab_file: htsmsg_xml_deserialize error Unknown syntatic element: <!DOCTYPE tv 2023-06-02 07:40:58.403 [WARNING] xmltv: /usr/bin/tv_grab_file: grab returned no data

In looking at the newly generated xml, I see: ?xml version='1.0' encoding='UTF-8'? tv source-info-name="py-m3u-epg-editor" generator-info-name="py-m3u-epg-editor" generator-info-url="py-m3u-epg-editor" Executing the file_grab, within tvh results in the above error. Editing the xml file to include: ?xml version="1.0" encoding="ISO-8859-1"? !DOCTYPE tv SYSTEM "xmltv.dtd" Removing tv soucre-info-name and all that follows, TVH accepts it as valid and processes it, no complaints.

The original.xml contains: ?xml version="1.0" encoding="utf-8" ? !DOCTYPE tv SYSTEM "xmltv.dtd" tv generator-info-name="MYPROVIDER" generator-info-url="noserver"

I'm not sure how to have tvh grab the original, from with tvh, to see if its an issue with the header. Thanks

bebo-dot-dev commented 1 year ago

Hi there,

There's been nothing changed in the EPG XML create_new_epg function area for some time and the header info in the newly generated EPG looks the same today as it always has done in the history of this project.

If I had to guess what's going on, I'd say that TVH has somehow changed where it now doesn't like the XML for some reason.

Your comment is a little hard to follow btw, if you paste XML/code/markup as code in your comments, that will be more readable, complete and understandable thanks.

bebo-dot-dev commented 1 year ago

For reference, the first two lines in a newly generated EPG XML file are currently:

<?xml version='1.0' encoding='UTF-8'?>
<tv source-info-name="py-m3u-epg-editor" generator-info-name="py-m3u-epg-editor" generator-info-url="py-m3u-epg-editor">

No doctype declaration is included in the XML file i.e. something like this will not be present:

<!DOCTYPE tv SYSTEM "xmltv.dtd">

Maybe check if the TVH project changed something or tightened something up.

halfagascan commented 1 year ago

Ok, back at it, I'll try and better explain and put into code blocks. head -3 pretty_original.xml<?xml version="1.0" encoding="utf-8"?><!DOCTYPE tv SYSTEM "xmltv.dtd"><tv generator-info-name="XXX" generator-info-url="noserver">

head -3 provider.xml `<?xml version='1.0' encoding='UTF-8'?>

` After using the script, I have provider.xml, as you can see `` is not in the generated provider.xml, if I edit provider.xml, and remove `and insert` missing info, tvh is happy. From my limited reading of xmltv dtd, seems this is "supposed" to required. TVH forums had bugs related to this, years ago, Bug #2644, supposedly fixed, tvh forums are basically useless for help.
bebo-dot-dev commented 1 year ago

OK we don't appear to have any idea what's changed in TVH but I have to assume that something has changed because I am able to confirm that nothing has changed in this project that affects how EPG XML files are generated.

My guess is that TVH has for whatever reason decided to start having an issue with the missing <!DOCTYPE tv SYSTEM "xmltv.dtd"> doc type declaration. I'll go as far as guessing that if you just add in the doc type declaration and leave the <tv source-info-name="py-m3u-epg-editor" generator-info-name="py-m3u-epg-editor" generator-info-url="py-m3u-epg-editor"> element as-is, doing this will be enough to make it happy. Please try that and report back.

To back this idea up, the EPG XML DTD spec is here: https://github.com/XMLTV/xmltv/blob/master/xmltv.dtd

It can be seen that the DTD does contain a doc type declaration and that the source-info-name, generator-info-name and generator-info-url attributes are all valid attributes for the tv element.

halfagascan commented 1 year ago

yea, tried that, leaving the "source-info-name", didn't like that. I'm looking for tvh 4.2.1 on hub docker but all they list is latest, I'll ask on discord. Thanks

bebo-dot-dev commented 1 year ago

I'm happy to apply a change to include the <!DOCTYPE tv SYSTEM "xmltv.dtd"> doc type declaration. Although I believe it's not strictly required, it being there would at least make the generated EPG XML closely match the https://github.com/XMLTV/xmltv/blob/master/xmltv.dtd example.

That said it sounds like that change won't be enough to make it work so it's not worth doing anything until we know exactly what TVH has decided to have a problem with.

halfagascan commented 1 year ago

switched to linuxserver/tvheadend:stable-4.2.1, previous was lscr.io/linuxserver/tvheadend:latest. No change, same error after using the script, input <!DOCTYPE tv SYSTEM "xmltv.dtd"> and removing source-info-name=. I think your suggestion to include <!DOCTYPE tv SYSTEM "xmltv.dtd"> is a step in the right direction.

I do have another provider and using this script, HTS Tvheadend 4.3-2120~g18effa8ad, that outputs a fine xml file, that tvh consumes, no issues. I may have to fine comb the xml, to find the major error. Thanks

bebo-dot-dev commented 1 year ago

I think your suggestion to include <!DOCTYPE tv SYSTEM "xmltv.dtd"> is a step in the right direction

I agree, I've applied and merged this change in #78

For reference the first three lines in a newly generated EPG XML file will now look like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE tv SYSTEM "xmltv.dtd">
<tv source-info-name="m3u-epg-editor" source-info-url="github.com/bebo-dot-dev/m3u-epg-editor" source-data-url="github.com/bebo-dot-dev/m3u-epg-editor" generator-info-name="m3u-epg-editor" generator-info-url="https://github.com/bebo-dot-dev/m3u-epg-editor">

The resulting generated EPG XML successfully validates using https://www.xmlvalidation.com/ against the https://github.com/XMLTV/xmltv/blob/master/xmltv.dtd

As an aside using a :latest tag with docker images invites unexpected behaviour, it's generally safer to pin to a known good/stable image version as you're now doing. This thinking doesn't apply to just TVH but in a more general sense for all docker images. More info here: https://www.howtogeek.com/devops/understanding-dockers-latest-tag/

halfagascan commented 1 year ago

Thanks for the addition, I'll update the script. I generally agree using latest is problematic, wait for it, however the last stable tvh, 4.2.1, was created 7 years ago, no updates from tvh, I think the guys at lscr.io, have been doing the updating, I checked xmltv version in 4.2.1, xmltv-0.5.69-r0, while the latest docker xmltv-1.1.2-r0. Thanks

halfagascan commented 1 year ago

I'll close with this comment. I resorted to using a paid m3u/xml generator, I believe the the list from my provider is beyond salvaging, just to many errors, and missing, incorrect data. No matter what I tried, tvh did not like it, even after adopting your change. I tried to find the mailing list for xmltv, seems to have disappeared, thought I would join and maybe solve this issue, but no such luck. If I may, what is your preferred guide viewer. Thanks for the help.

bebo-dot-dev commented 1 year ago

OK no problem, from my pov this has been a worthwhile exercise, #78 adds confidence that EPG XML files generated by m3u-epg-editor-py3 adhere to the DTD and validate as expected.

I am left intrigued by this, there is something in the XML data causing the htsmsg_xml_deserialize error Unknown syntatic element issue - perhaps unicode characters in the XML that it can't cope with. Impossible to say without being able to eyeball it and run tests on it.

It's been a long while since I've used any guide viewer / UI software - Kodi was my preference but things may have moved on and improved elsewhere.

halfagascan commented 1 year ago

Be careful with what you want to eyeball removed zip

bebo-dot-dev commented 1 year ago

Tests performed on your xml:

  1. Both original.xml and provider.xml check out as structurally well formed XML when checked with xmllint - (No errors are reported with: xmllint --noout original.xml; xmllint --noout provider.xml)
  2. Both original.xml and provider.xml xmllint validate against a local copy of the xmltv.dtd - (No errors are reported with: xmllint --noout --dtdvalid xmltv.dtd original.xml; xmllint --noout --dtdvalid xmltv.dtd provider.xml)
  3. Searched for non-standard non-ASCII characters in provider.xml (non-ASCII characters shouldn't cause a problem because an EPG XML file can be UTF-8 but I thought this was worth doing anyway because we don't know what's wrong with tv_grab_file / TVH) - (grep -oP "[^\x00-\x7F]" provider.xml | sort -t: -u -k1,1 | tr -d '\n'). The output is: ¡®—…£áàäćçðᵉéÉèêëíîïᶦᴸᴺñóôö™úùüᵛʷ
  4. Visually eyeballed the provider.xml data and noticed that icon src urls are broken, a problem that stems from the source original.xml e.g. <icon src="https:/vstreams.stream/backup/1000084.png" />

I suppose the broken icon src urls could cause an issue but ruling out that the tv_grab_file / TVH errors are just wrong and misleading, broken urls causing an htsmsg_xml_deserialize error Unknown syntatic element error seems very unlikely.

Overall I learnt nothing conclusive, the XML seems OK and TVH project support is what you need to understand what's going wrong in their XML parsing / deserializing code. I noticed there are a few similar bug reports in https://tvheadend.org/projects/tvheadend/issues

halfagascan commented 1 year ago

https://github.com/XMLTV/xmltv/issues/210 htsmsg_xml_deserialize error Unknown syntatic element #210

I think the file_grab is the issue. After much looking, found xmltv: /usr/bin/tv_grab_file: htsmsg_xml_deserialize error Unknown syntatic element: \u003c!DOCTYPE tv\n from my limited knowledge, \u003c!, is an html element, and not allowed in xml. Something in the grab file not interpreting correctly, I guess. The only way I found this was to look directly at the log file, as described via docker inspect. Docker logs showed as <something with the DTD>, couldn't something, but not very informative. Maybe somebody else has this issue. Thanks for the help, I do appreciate it.

bebo-dot-dev commented 1 year ago

Interesting, I think we knew from your first comment above that the fail is occurring in /usr/bin/tv_grab_file but being able to see a little more about what htsmsg_xml_deserialize error Unknown syntatic element is grumbling about is useful.

\u003c is the unicode representation of the < character and in this case it's referring to the opening first left angle bracket of the doc type declaration in the XML file. Search for U+003C in here to locate it's meaning: https://en.wikipedia.org/wiki/List_of_Unicode_characters.

It's valid for an XML document to contain a doc type declaration. It's not mandatory but is optional, valid and described in the W3C XML specification here: https://www.w3.org/TR/REC-xml/#sec-prolog-dtd and mentioned in the W3C XHTML standard guidelines doc here: https://html.spec.whatwg.org/multipage/xhtml.html#writing-xhtml-documents

doc type declarations are seen more often in (X)HTML documents than in XML documents but regardless, it is valid for an XML document to optionally contain a doc type declaration.

The question now is: which XML file does /usr/bin/tv_grab_file act upon in your TVH setup? Does it act upon and process the original EPG XML from your provider or is it the EPG XML file generated by m3u-epg-editor-py3 that it's processing?

If the former, this might explain why your setup has worked for some time but then inexplicably started to fail a couple of weeks ago if your provider started to include a doc type in their EPG XML at that point. If it's the latter then #78 is probably going to make the /usr/bin/tv_grab_file abandonware world an even worse place :)

halfagascan commented 1 year ago

Well, as more troubleshooting, line 7 from the tv_grab_file cat /config/data/*.xml, so if the original.xml is in the config/data, I'm guessing it would be catted. At some point in the past, I did change the output location of the editor to that location, prior to reading the tv_grab_file, I've since changed it to another location, added a cron to cp the m3u8 and xml to config/data/. It would be nice if xmltv changed the script to hard code catting the file, they already do this for the m3u8. Maybe worth mentioning this in your instructions. Thanks

bebo-dot-dev commented 1 year ago

tv_grab_file and Tvheadend thoughts:

I don't intend to document things related to tv_grab_file or Tvheadend in the README.md. At the end of the day the maintainers of downstream systems that people integrate with (e.g. Tvheadend) need to maintain their own documentation and support their own software. https://tvheadend.org/projects/tvheadend/issues

From an m3u-epg-editor-py3 pov, the output location of where generated files end up is: specify an outdirectory location and this is where m3u-epg-editor-py3 generated files are written.