JuanBindez / pytubefix

Python3 library for downloading YouTube Videos.
http://pytubefix.rtfd.io/
MIT License
454 stars 67 forks source link

Differentiating Automatically-generated Caption #199

Open teron131 opened 2 weeks ago

teron131 commented 2 weeks ago

How to mark the fetched caption lang code to differentiate whether the caption is automatically generated?

Currently:

from pytubefix import YouTube

# Using url where only auto-gen English caption is available
yt = YouTube(sample_urls[0])
yt.caption_tracks[0]
# <Caption lang="English" code="en">

yt.captions["en"]
# <Caption lang="English" code="en">

I see that the source code included captions.py:

class Caption:
    """Container for caption tracks."""

    def __init__(self, caption_track: Dict):
        """Construct a :class:`Caption <Caption>`.

        :param dict caption_track:
            Caption track data extracted from ``watch_html``.
        """
        self.url = caption_track.get("baseUrl")

        # Certain videos have runs instead of simpleText
        #  this handles that edge case
        name_dict = caption_track['name']
        if 'simpleText' in name_dict:
            self.name = name_dict['simpleText']
        else:
            for el in name_dict['runs']:
                if 'text' in el:
                    self.name = el['text']

        # Use "vssId" instead of "languageCode", fix issue #779
        self.code = caption_track["vssId"]
        # Remove preceding '.' for backwards compatibility, e.g.:
        # English -> vssId: .en, languageCode: en
        # English (auto-generated) -> vssId: a.en, languageCode: en
        self.code = self.code.strip('.')

How to make the function calls to get "a.en"?