XMLTV / xmltv

Utilities to obtain, generate, and post-process TV listings data in XMLTV format
GNU General Public License v2.0
275 stars 93 forks source link

Add grabber for Slovenia #110

Open sagudev opened 4 years ago

sagudev commented 4 years ago

What type of Pull Request is this?

Does this PR close any currently open issues?

No

Please explain what this PR does

It add grabber for Slovenia. It takes data from spored.siol.net

Any other information?

No

Where have you tested these changes?

Operating System: Ubuntu 20.04

Perl Version: v5.30.0

knowledgejunkie commented 4 years ago

@sagudev Thank you for your contribution! A couple of questions:

i) is there anything on the siol.net that prohibits using their listings data?

ii) as this would be a new addition to XMLTV, have you considered using ParseOptions (see lib/Options.pm) to manage the grabber at runtime and simplify the structure of the code?

sagudev commented 4 years ago

i) is there anything on the siol.net that prohibits using their listings data?

Not that I am aware of. Is there any place I should be looking at (like robots.txt)?

ii) as this would be a new addition to XMLTV, have you considered using ParseOptions (see lib/Options.pm) to manage the grabber at runtime and simplify the structure of the code?

I based my grabber on tv_grab_ch_search, as it was mentioned in last commit, so I thought it uses the new addition to the Xmltv framework (ParseOptions). When I was almost done I realized that it does not use ParseOptions. If I will have time I will rewrite using ParseOptions.

knowledgejunkie commented 4 years ago

i) is there anything on the siol.net that prohibits using their listings data?

Not that I am aware of. Is there any place I should be looking at (like robots.txt)?

I was thinking of the site's Terms and Conditions of use, or any agreement in order to use the site.

ii) as this would be a new addition to XMLTV, have you considered using ParseOptions (see lib/Options.pm) to manage the grabber at runtime and simplify the structure of the code?

I based my grabber on tv_grab_ch_search, as it was mentioned in last commit, so I thought it uses the new addition to the Xmltv framework (ParseOptions). When I was almost done I realized that it does not use ParseOptions. If I will have time I will rewrite using ParseOptions.

No problem. ParseOptions takes care of a lot of things but if the grabber is already done don't worry about taking time to refactor it.

sagudev commented 4 years ago

I was thinking of the site's Terms and Conditions of use, or any agreement in order to use the site.

As far as I checked it's all clear. And also webgreb++ is using it as data source.

garybuhrmaster commented 4 years ago

i) is there anything on the siol.net that prohibits using their listings data?

Not that I am aware of. Is there any place I should be looking at (like robots.txt)?

While you are likely not a lawyer (and no one here would expect a legal opinion anyway), one commonly should review the sites Terms of Use / Terms of Service (in the native tongue of those terms) to determine if it mentions anything about restricting use only to subscribers while using their website, or the data on the site being copyrighted (i.e. not available for use without permission), or not allowing screen scraping, or allowing only linking (and not retrieval) to the site (there are, of course, many possible restrictions, but you likely get the idea of what to look for). Sometimes the restrictions/requirements are clear, or sometimes the site explicitly allows the data to be accessed, but more commonly the terms are a bit vague, which makes it much more of a judgement call. I would think that doing a good faith review of the terms of service / terms of use is about all one can be expected to perform.

sagudev commented 4 years ago

It does not mention any grabing or scraping. Only in article 4, it is stated that

  1. Copyright

Content posted by the service owner is allowed to be reviewed. Content may not be reproduced, modified, transcribed, republished or distributed for either commercial or non-commercial purposes without the express prior written permission of the service owner. In the event of any permitted use of the content of these pages, all copyright and industrial property rights notices and other notices and warnings must be retained.

so does that mean we can grab data?

pmhahn commented 3 years ago

It does not mention any grabing or scraping. Only in article 4, it is stated that

  1. Copyright

Content posted by the service owner is allowed to be reviewed. Content may not be reproduced, modified, transcribed, republished or distributed for either commercial or non-commercial purposes without the express prior written permission of the service owner. In the event of any permitted use of the content of these pages, all copyright and industrial property rights notices and other notices and warnings must be retained.

so does that mean we can grab data?

My reading of this is no as your grabber is transcoding the Web-Page into some other XMLTV format.

This business model of those pages is mostly to get you there as a person so you see their advertisements. By using a grabber, which filters out those advertisements.

If in doubt ask them directly and get their written permit.

knowledgejunkie commented 3 years ago

It's good to see this grabber being developed - do we have a consensus about whether we can add this grabber to the project?

sagudev commented 3 years ago

It's good to see this grabber being developed - do we have a consensus about whether we can add this grabber to the project?

I didn't get any answer. I will still be maintaining grabber for my personal needs.