daniel-widrick / zap2it-GuideScraping

MIT License
61 stars 22 forks source link

Suggestions for improvement #7

Closed th0ma7 closed 3 years ago

th0ma7 commented 3 years ago

Hi, I started looking into your project as my alternate option using perl is no longer maintained and not trivial to package to due compile CPAN mandatory modules needed (now only available using the archive section). https://web.archive.org/web/20200426004001/http://zap2xml.awardspace.info/

I played a bit with your python scripts and must say that it runs quite well. As developer in the SynoCommunity I intent to package your python scripts in order for it to be available for both jellyfin and tvheadend. Although while playing with it I thought I would provide you with a list of potential improvements/nice-to-have that you might see fit that would simplify integration down the road:

  1. Ability to pass the configuration file in argument: this allows breaking the files into sub-directory and gets easier to maintain through package upgrades (scripts can be updated independently from the user-configuration)
  2. Ability to define a location (either in argument an/or through the configuration file) for the generated resulting xmltv file. Again, similar to item 1. it allows to grant permissions only on that directory so other applications can get access to it.
  3. Ability to keep a few backup copies of the xmltv file. perhaps something like using a timestamp. As such the latest would always have the same name while a next update it creates a xmlguide.TIMESTAMP.xmltv. Cleanup can then be managed through a cron job or within the python script itself (for instance keeping X days of backups).
  4. the perl script I refereed to had a few nice features, one was to download picons locally as well. This can be quite useful in for some use-cases. Here a .png listing example generated by perl zap2it script:
    10.1 CFTMDT.png
    10.1_CFTMDT.png
    10.1.png
    12.1 CFCFDT.png
    12.1_CFCFDT.png
    12.1.png
    ...
    s10084_h3_aa.png
    s10108_h3_aa.png
    s10125_h3_aa.png
    ...
  5. again, the perl script had the ability to cache the downloaded information. This allow to re-download only what was not previously cached and reduces hits on the server. Having this capability using a dedicated caching directory would be really nice. One caveat is that when a sub-cache file contains programs with "$sTBA" (to be advertised) they must be discarded or re-downloaded at next update as program info may end-up being updated by that time (e.g. some movies in 10-14 days may not yet be advertised... but a few days before they do occur the program guide might have received an update).

I'll be looking at creating a SynoCommunity package over the next few weeks. Let me know if you are interested at adapting a few bits of your scripts.

Ref: https://github.com/SynoCommunity/spksrc/pull/3932#issuecomment-825633493

Thnx in advance and again, nice work :)

daniel-widrick commented 3 years ago

Hello and thank you for the kind words.

I'm glad someone found a use for this and am excited to hear about your projects. I don't know how much time I will have to dedicate to this in the short term.

Suggestions 1 and 2 make a lot of sense and should probably be done regardless.

Suggestion 3 may be useful, though I do question the usefulness of historical guide data for most use cases.

Suggestion 4... Is something I do not personally have an interest in implementing, but I'd be happy to address and review any pull requests that attempt the feature.

Suggestion 5 is something I personally haven't seen a need for yet. If more people start hitting the service or start attempting to get hourly updates it may cause issues. I personally don't know how often the guide data changes but with loose testing, I never had too many issues with a daily refresh.

Keep me posted, and as I mentioned, I'm willing to review and accept any reasonable pull requests.

daniel-widrick commented 3 years ago

Suggestions 1 & 2 are implemented.

Suggestion 3 is implemented in the sense that it will store timestamped copies along side the output guide file. I'm going to reflect on the simplest way to implement a clean up process as in it's current state, without a cron job/logrotate/third party coming through to clean these files they will accumulate .

th0ma7 commented 3 years ago

I'm glad someone found a use for this and am excited to hear about your projects. I don't know how much time I will have to dedicate to this in the short term.

Same here as already involved in too many projects within SynoCommunity.

Suggestions 1 and 2 and 3...

Thnx, much appreciated!

Suggestion 4... Is something I do not personally have an interest in implementing, but I'd be happy to address and review any pull requests that attempt the feature.

Probably relatively trivial to do a wget (or similar in python) as the xmltv provide the picons url for all channels. This may be where the caching helped now that I think of it... My spare cycles are rather limited but I might send a PR your way if I find some time.

Suggestion 5 is something I personally haven't seen a need for yet. If more people start hitting the service or start attempting to get hourly updates it may cause issues. I personally don't know how often the guide data changes but with loose testing, I never had too many issues with a daily refresh.

Me neither. I believe it's more a way to limit hammering the server and allow for a faster rebuild time as some of the data was already pre processed. With that activated using the perl version I current pull the server 3x a day every 8 hours (or is it twice every 12... ).

Keep me posted, and as I mentioned, I'm willing to review and accept any reasonable pull requests.

I will certainly do. I'll start-up a PR on https://github.com/SynoCommunity/spksrc as a first preview and add you to the loop. Tagging a release could be helpful as otherwise I have to manually adjust to use latest commit, but that something I can easily manage.

Thnx in looking into this, much appreciated!

daniel-widrick commented 3 years ago

Coming back to suggestion 4, the picons still aren't something i'm familiar with, but i'd be tempted to push such functionality to a separate script/tool.

I can certainly add a release. I'll do that after i decide what to do about the historical data clean up.

daniel-widrick commented 3 years ago

Release has been tagged with historical guide cleanup. Default set to 14 days of guide data.,

th0ma7 commented 3 years ago

Awesome! Will look at packaging it over the week-end and will let you know how thing goes.

th0ma7 commented 3 years ago

@daniel-widrick quick question, any license attached to your code? GPLv2, other?

daniel-widrick commented 3 years ago

Added a license file. MIT should hopefully work for most people.

th0ma7 commented 3 years ago

@daniel-widrick While playing with the crontab implementation I ended-up figuring that using #!/usr/bin/env python3 part of the header solves pretty much all cases of call to python3 binary. Could this be added to the zap2it-GuideScrape.py script? Otherwise I can script that at package creation time.

Secondly, I've seen a few cases where we encrypt the passwd using $(echo -n "$password" | openssl enc -a) or similar which avoid having a plain text password in the configuration file. Not sure if its worth it but perhaps your code could detect such encrypted passwords (or use a Password_enc configuration variable when needed?) and make a call to openssl to decrypt prior to using it?

Here is a simple example:

$ echo "TEST1234" | openssl enc -a
VEVTVDEyMzQK
$ echo "TEST1234" | openssl enc -a | openssl enc -a -d
TEST1234

Or by using a stronger encryption with a common key (in this case zap2it but could be anything) such as:

$ echo "TEST1234" | openssl enc -aes-256-cbc -pass pass:zap2it -a
U2FsdGVkX1+7sTHCo2+wEAXCfs/L++ST7beWCIU5irA=
$ echo "TEST1234" | openssl enc -aes-256-cbc -pass pass:zap2it -a | openssl enc -d -aes-256-cbc -pass pass:zap2it -a
TEST1234
th0ma7 commented 3 years ago

@daniel-widrick package for synology DSM is pretty much ready (currently in review). It allow installing/upgrading while keeping backup copies of previous configurations and generated xmltv files.

The remaining changes part of this PR are more of a nice to have that could still be of interest (besides fixing language handling https://github.com/daniel-widrick/zap2it-GuideScraping/issues/9 which is sort of a blocker for my family to use currently):

Thnx a lot for the changes you did, it's much appreciated as it allowed me to package your work successfully. Feel free to close the issue as you se fit unless you want to come back to it at some later time.

Cheers!