EchterAlsFake / xvideos_api

a Python API for xvideos.com
GNU Lesser General Public License v3.0
16 stars 4 forks source link

Add uploader details #7

Open sparklingbee opened 4 weeks ago

sparklingbee commented 4 weeks ago

First of all, thank you for this package. It really one of "What you never thought you needed but you needed".

I just wanted to suggest some improvements on the package if you have the time.

When looking at a video, you can have an info like subscriber count visually or whether or not it is a channel, is it possible to add an Author object with fields like : subscriber_count, is_channel, is_model ?

PS: The code simplicity and cleanliness is quite impressive also.

EchterAlsFake commented 3 weeks ago

Hi,

I can't see them. With the subscriber count, do you mean the numbers next to the models marked in red?

And the is_channel and is_model attribute:

image

As you can see in the screenshot, there's this black box with 22k followers. If I click on it I get redirected to Elly Clutch and there's an info box saying it is a channel. Do you mean this with the visibility?

For a lot of videos there's this channel tag when clicking on the black box as seen in this screenshot:

image

(Kanal = Channel)

I could use a regex to check if it's there and then it is a channel and if not it's a model.

Did you mean it like this or am I misunderstanding something there. I need to say that I actually don't use xvideos, so I just made the API for the site, but I have no idea of the functions and the design.

Edit:

Regarding the subscriber count, When I return the models featured in this video, I think it was an attribute in the Video class returning a list of names, I could return a dictionary instead with the model being the first item and the subscriber count being the second.

What do you think about that?

sparklingbee commented 3 weeks ago

Hi, thanks for the quick response.

I can't see them. With the subscriber count, do you mean the numbers next to the models marked in red?

Yes, the subscriber count is the number next to "Ellyclutch" : 22k.

I could use a regex to check if it's there and then it is a channel and if not it's a model.

As far as I know, there are three kind of users : profiles (regular people), pornstars and channel accounts.

You already have a Pornstar class that will some of the infos of "Elly", "Zoey", etc like number pages etc but some ones like subscriber_count or description or country are missing.

Also there's no way to get these infos for the account that published to video (it is not always a pornstar).

It would be nice to have a generic User class that Pornstar, Channel and Profile would inherit with common properties : description, subscriber_count, is_channel, is_pornstar.

(Sorry, is_model doesn't seem to have much sense, forget about it.)

Regarding the subscriber count, When I return the models featured in this video, I think it was an attribute in the Video class returning a list of names, I could return a dictionary instead with the model being the first item and the subscriber count being the second.

What you did seems just fine actually. If you make it a dict, there's a risk that you need come back to this dict to add new keys later. People needing more infos will use the Pornstar class to fetch it, this way, there's a separation of concerns. I would just suggest that you return the pornstars' users_id instead of the names.

Tell me if it is more clearer or if I need to clarify some points. Also tell me if you think that the remarks are valid.

EDIT: I would love to work on this but I am pretty bad at web crawling and html parsing.

EchterAlsFake commented 3 weeks ago

So I checked a bit and the way I can see if it's a Pornstar or a Channel is by following the link of the black box. Then I can go into the info section of the user, for example: Channel: https://de.xvideos.com/hornytori1#_tabAboutMe Pornstar: https://de.xvideos.com/pornstars/sweetie-fox1#_tabAboutMe

And there are a lot of infos, but the problem is that xvideos doens't tell me if it's a user or a channel, so I need to check using a regex. I check for an object only a Pornstar can have and if the regex fails it means that it's a Channel.

The attributes for channels differ from the attributes a Pornstar can have, so I would need to indeed handle them seperately probably with two classes (or one, because I already have a pornstar object)

So I don't know if I am right with this assumption, but the link that goes from the black box, so the first link that appears in this list, will always be the channel and all red boxes after will be the Pornstars / Models featured in it.

My idea was to make a Channel class which fetches the information for the channel. If a user wants to get the pornstars attribute from the Video class it will instead of the names return a Pornstar object and I will extend the Pornstar class to also fetch the personal information alongside with a user id and the other stuff that is shown on the website.

With this approach I would also not need to make an is_channel attribute because it's clear from the code then.

Want to access Pornstar profiles? Do:


from xvideos_api import Client
pornstars = Client().get("video_url").pornstars
for pornstar in pornstars:
    print(pornstar.id)
    print(pornstar.country) ...............

And the channel will also be a separate object just like the Pornstar class.

Do you think this would be a good implementation? I would then have enough time to make this tomorrow :)

sparklingbee commented 3 weeks ago

So I don't know if I am right with this assumption, but the link that goes from the black box, so the first link that appears in this list, will always be the channel and all red boxes after will be the Pornstars / Models featured in it.

The first link can be a channel or just a normal profile (pornstar or just regular), I guess that some fields may be different.

My idea was to make a Channel class which fetches the information for the channel. If a user wants to get the pornstars attribute from the Video class it will instead of the names return a Pornstar object and I will extend the Pornstar class to also fetch the personal information alongside with a user id and the other stuff that is shown on the website.

That is a way of doing it.

I doubted it because it can generate a lot of "noise requests". To extract Pornstar objects completely, you'll need to do requests on the pornstar API, that's a call different from video API. Some can argue that it is unneeded overhead and get_video may be doing much more than just getting the video infos.

A nice compromise would be to return the Pornstar objects but only with the name attribute and subscribers_count which are available from the video url. The other attributes shall be fetched separately.

Something like...

from xvideos_api import Client
pornstars = Client().get("video_url").pornstars # pornstar objects with only minimal data that can be obtained from the video_url
for pornstar in pornstars:
    pornstar.get_data()
    print(pornstar.id)
    print(pornstar.country) ...............

It would be really nice to have everything in one go but the number of requests and code complexity can increase quickly. But that's just my opinion, it's your code at the end of the day.

EchterAlsFake commented 3 weeks ago

So I started with the implementation and I came across an issue.

If I extract the information from the profile it depends on the language of the root host. Since I live in Germany all URLs on xvideos start with de.xvideos.com. If I request the aboutme tab from a model page using requests.get(url) it returns the values inside the code as German keywords. So a dictionary looks like this now:

image

The first dictionary is from a Channel and the values there are in English which I don't understand, becasue the second dictionary is from a Pornstar and the keys are in German there. So this wouldn't be a problem, but when I want to make the values directly accessible like:

print(pornstar.id)
print(pornstar.country)

I would need to access the key, but what if a user enters an English video URL and then the keys are in English? This would lead to my code not being able to find this key names in the dictionary and return None.

So, my idea was to not make a function for every name and caching it, but just returning the dictionary. So people who want to have this data can work with the dictionary then.

Or the other method which I would like more, would be to always change the subdomains for example de.xvideos.com to www.xvideos.com so that everything is always in English. With this approach I could make the cached property stuff and make everything accessible through function methods.

But this would make people loose the functionality to use their own language.

OR the third approach would be to use the second and if people use another language then english the dictionary is returned and if it's english they can use the function attributes.

What do you think about it?