Kakulukian / youtube-transcript

Fetch transcript from a youtube video
297 stars 61 forks source link

Discard auto-generated transcript #36

Open kaanguneyli opened 3 months ago

kaanguneyli commented 3 months ago

Is there a way to discard the auto-generated transcripts. When I pick language "en", the program directly picks the auto-generated one although there is original transcript for the video. I looked for some options but couldn't find any?

aehlke commented 3 months ago

any luck?

Marceltbn commented 2 months ago

Either the languageCode is wrong or the order. Posting your captionTracks would help. If the order is wrong I would search/"find" for vssId (a.ja signifies automated, .ja normal) and not the languageCode.

[
    {
        "baseUrl": "",
        "name": {
            "simpleText": "Japans (automatisch gegenereerd)"
        },
        "vssId": "a.ja",
        "languageCode": "ja",
        "kind": "asr",
        "isTranslatable": true,
        "trackName": ""
    },
    {
        "baseUrl": "",
        "name": {
            "simpleText": "Japans"
        },
        "vssId": ".ja",
        "languageCode": "ja",
        "isTranslatable": true,
        "trackName": ""
    }
]

I'm on an old custom version but this is how mine looks like:

if (langCode) {
    captionTrack =
        availableCaptions.find((track: any) => {
            /* track.languageCode.includes(langCode); */
            return track.vssId === '.' + langCode;
        }) ?? undefined;

    if (captionTrack === undefined) {
        captionTrack =
            availableCaptions.find((track: any) => {
                /* track.languageCode.includes(langCode); */
                return track.vssId === 'a.' + langCode;
            }) ?? availableCaptions?.[0];
    }
}

For the current version this could work? Haven't tried it though

    let transcriptURL: any
    transcriptURL = 
      config?.lang
        ? captions.captionTracks.find(
            (track) => track.vssId === '.' + config.lang
          )  // returns undefined if not found
        : undefined 

    if(transcriptURL === undefined){
        transcriptURL = (
          config?.lang
            ? captions.captionTracks.find(
                (track) => track.languageCode === config?.lang
              )
            : captions.captionTracks[0]
        ).baseUrl;
    } else {
        transcriptURL = transcriptURL.baseUrl
    }