TalkBank / batchalign2

Tools for language sample analysis.
https://talkbank.org/info/BA2-usage.pdf
BSD 3-Clause "New" or "Revised" License
16 stars 4 forks source link

Can't Generate transcript for Arabic Audio #3

Open abeerM opened 6 months ago

abeerM commented 6 months ago

I am trying to generate a transcription for an mp3 audio in Arabic language, while following the instructions. but sentence return empty list.

sentences = doc.transcript(include_tiers=False, strip=True) print(sentences) Note: The below lines were removed due to IndexError: list index out of range

first_utterance = doc[0] first_form = doc[0][0] the_comma = doc[0][1]

assert the_comma.text == ',' assert the_comma.type == ba.TokenType.PUNCT

When adding those lines for detailed transcription, the program never terminated (infinite loop)

nlp = ba.BatchalignPipeline.new("asr,morphosyntax", lang="ara", num_speakers=2) doc1 = nlp(doc) # this is equivalent to nlp("audio.mp3"), we will make the initial doc for you

first_word_pos = doc1[0][0].morphology first_word_time = doc1[0][0].time first_utterance_time = doc1[0].alignment

Note: before the getting into infinite loop the following lines were displayed at the terminal

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.ngs are fine-tuned or trained. _attentiondoes not supportoutput_attenti WhisperModel is using WhisperSdpaAttention, but torch.nn.functional.scaled_dot_productation, but specifying the manual implementat_attention does not support output_attentions=True or layer_head_mask not None. Faved using the argument attn_implementation=lling back to the manual attention implementation, but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argumentattn_implementation="eager"` when loading the model.

Jemoka commented 4 months ago

I think our tokenizer maybe failing on this case. Could you explicitly specify Whisper to see if it does better?

nlp = ba.BatchalignPipeline.new("asr,morphosyntax", lang="ara", num_speakers=2, asr="whisper")
doc1 = nlp("audio.mp3")