V3ntus / quint

State of the art audio summarization model.
MIT License
5 stars 1 forks source link

Reformat the output filename -- 1) make it shorter; 2) check how unicode works and change extension accordingly #9

Open turnkit opened 1 year ago

turnkit commented 1 year ago

Currently the plaintext input filename that comes from OpenAI's Whisper is usually something like: SID0574.mp3.txt

Please reformat the paragraph chunked output filename so that everything after the first period is truncated and then followed by ".txt"

A successful output file should then look like: SID0574.txt (or possibly SID0574.rtf but see below.)


If Unicode is present please check to see that a browser will open the .txt file and show the Unicode characters correctly.

If it does not we will have to change the default output to a file format that supports Unicode such as .rtf.

.rtf supports Unicode but it is not clear to me that modern browsers show .rtf files correctly. If .txt does not show unicode characters properly please try testing the same output file with a .rtf file extension and see if a browser opens and displays it properly. If it does then change the default output extension to .rtf.

See: https://stackoverflow.com/questions/55212444/unicode-characters-within-text-document-display-on-browser

tapearchives commented 1 year ago

Looking back at this issue -- the problem is that currently output files that include unicode, such as music symbols, don't process correctly in Quint, not that the .txt file is the incorrect file type to use for unicode ( https://sites.psu.edu/symbolcodes/software/textfile/ ) -- so is there a way to address that issue such that Quint will accept, and output, all characters including Unicode?

tapearchives commented 1 year ago

as such this issue is already listed https://github.com/V3ntus/quint/issues/1