Closed ghost closed 11 months ago
which issue are you referring to @toprak ? is this model not able to save SRT?
@SeeknnDestroy yeah it only gives json
have you tried using the writer from whisperx library @toprak ? https://github.com/m-bain/whisperX/blob/main/whisperx/utils.py#L406C10-L406C10
Here is how they use their writer https://github.com/m-bain/whisperX/blob/main/whisperx/transcribe.py#L155C8-L155C8
Hey @toprak @SeeknnDestroy - That's a lovely suggestion, would either of you mind helping with this? I am currently quite swamped with work and would appreciate any help possible!
@toprak This API provides SRT output with Whisper-v3 - https://developer.monsterapi.ai/reference/post_generate-speech2text-v2
The output is same as that given by faster-whisper
Quick script to convert the json into srt format, courtesy of chatGPT. Tested with VLC.
import json
def convert_to_srt(chunks):
"""
Convert the chunks of transcripts into .srt format with correct newline placements.
"""
srt_format = []
for index, chunk in enumerate(chunks, start=1):
start_time = chunk['timestamp'][0]
end_time = chunk['timestamp'][1]
# Convert timestamp to SRT format: hours:minutes:seconds,milliseconds
start_srt = f"{int(start_time // 3600):02d}:{int(start_time % 3600 // 60):02d}:{int(start_time % 60):02d},{int(start_time % 1 * 1000):03d}"
end_srt = f"{int(end_time // 3600):02d}:{int(end_time % 3600 // 60):02d}:{int(end_time % 60):02d},{int(end_time % 1 * 1000):03d}"
# Append the formatted string to the list, with correct newline placements
srt_format.append(f"{index}\n{start_srt} --> {end_srt}\n{chunk['text'].strip()}\n\n")
return srt_format
# Path to the JSON file
file_path = '/path/to/your/transcript.json'
# Load the JSON file
with open(file_path, 'r') as file:
transcript_data = json.load(file)
# Convert the transcript data to .srt format
srt_data = convert_to_srt(transcript_data['chunks'])
# Path for the output .srt file
srt_file_path = '/path/to/save/converted_transcript.srt'
# Writing to a .srt file
with open(srt_file_path, 'w') as file:
file.writelines(srt_data)
print(f"Converted .srt file saved at: {srt_file_path}")