V3ntus / quint

State of the art audio summarization model.
MIT License
5 stars 1 forks source link

Add additional INFO output during processing #8

Open turnkit opened 1 year ago

turnkit commented 1 year ago

Prefix the info line with current date and time.

Also please ensure that the filename is output on the info display when there is a success but also even if it fails. Currently if the first file encountered fails there is no output of that filename so it's difficult to know which file is failing.

Also please add an date and time stamp and a phrase such as "Paragraph chunking succeeded for file ..." or "Paragraph chunking FAILED for file..." for each file.

-- [INFO]: Writing to C:\sermonindex_audio\bak_whisp_out\txt_chunked_out/SID1189.mp3.txt_out.txt...

turnkit commented 1 year ago

Here's an example where the first file fails an currently there is no INFO output:

PS C:\sermonindex_audio\bak_whisp_out> .\do_quint_transcript.bat 192.168.0.62:8000

C:\sermonindex_audio\bak_whisp_out>set INPUT_FOLDER=.\txt_to_do

C:\sermonindex_audio\bak_whisp_out>set OUTPUT_FOLDER=.\txt_chunked_out\ Input folder is .\txt_to_do Output folder is .\txt_chunked_out\ [INFO]: Input is a directory, assuming batch mode [INFO]: Found 1448 files in .\txt_to_do Traceback (most recent call last): File "C:\sermonindex_audio\chunk_paragraphs.py", line 136, in main() File "C:\sermonindex_audio\chunk_paragraphs.py", line 125, in main chunk_paragraphs_dir(args.i, args.o) File "C:\sermonindex_audio\chunk_paragraphs.py", line 78, in chunk_paragraphs_dir _input_contents: str = input_file.read().encode("ascii", errors="ignore").decode().replace("\r\n", " ").replace("\n", " ") File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2544.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 101: character maps to PS C:\sermonindex_audio\bak_whisp_out>