32 Make data from databases usable for ASR training

jindaznb commented 1 year ago

Script file is under /server/app/scripts/get_csv.py
output CSV is in /server/tmp/output.csv
downloaded file is in folder /server/tmp/

Once backend code is running, script will download audios, and output CSV automatically

ztybigcat commented 1 year ago

Hi Jinda, Could you provide some documentations on how to use this functionality? Especially when the app is deployed using docker in the cloud, how would I execute this script? Thank you 😄

jindaznb commented 1 year ago

Hi Jinda, Could you provide some documentations on how to use this functionality? Especially when the app is deployed using docker in the cloud, how would I execute this script? Thank you 😄

Thank you. To execute the script, follow these steps:

Compile docker, and make sure there is at least one audio file submitted
Visit the following API endpoint: http://localhost:5000/api/v1/speak/get_csv
When you access this endpoint, you will receive a response like this:

{
   "csv_path": "/server/tmp/output.csv",
   "message": "CSV file generation complete"
}

This indicates that the CSV file has been successfully generated. You can download it using the provided csv_path. If you have any more questions, please feel free to ask! 😄

ztybigcat commented 1 year ago

Hi Jinda, Thanks for your reply. I tried the functionality, and it works! Right now I have two more concerns:

The export dump the files in a folder that is not in a shared volume with the host but in a path within container, this makes it difficult to retrieve the files later on. Could you dump the files in a shared volume under data to make retrieval easier?
Could you document the usage for this in a section in RELEASE.md? When describing the steps, assuming we are ssh ing into the cloud server that are running the server as a docker service (eg. no public exposure of port 5000, call the api via cli, etc).

jindaznb commented 1 year ago

Hi Jinda, Thanks for your reply. I tried the functionality, and it works! Right now I have two more concerns:

The export dump the files in a folder that is not in a shared volume with the host but in a path within container, this makes it difficult to retrieve the files later on. Could you dump the files in a shared volume under data to make retrieval easier?

Could you document the usage for this in a section in RELEASE.md? When describing the steps, assuming we are ssh ing into the cloud server that are running the server as a docker service (eg. no public exposure of port 5000, call the api via cli, etc).

Hi Tianyi, Thank you for the great advice!

The export dump the files in a folder that is not in a shared volume with the host but in a path within container, this makes it difficult to retrieve the files later on. Could you dump the files in a shared volume under data to make retrieval easier?

Now the generated files(audio, csv) will be also available in local folder data/voice/output

Could you document the usage for this in a section in RELEASE.md? When describing the steps, assuming we are ssh ing into the cloud server that are running the server as a docker service (eg. no public exposure of port 5000, call the api via cli, etc).

Updated in RELEASE.md

ztybigcat commented 1 year ago

Hi Jinda,

Thanks for the update in documentation! However, upon testing I could not find the csv and wav file in data/voice/output.

[ec2-user@ip voice]$ curl -X GET http://localhost:5000/api/v1/speak/get_csv
{"csv_path":"/tmp/output.csv","message":"CSV file generation complete"}
[ec2-user@ip voice]$ ls -lh
total 0
drwxr-xr-x 19 root root 309 Nov  9 02:05 local-s3
[ec2-user@ip voice]$ pwd
/home/ec2-user/VoiceCollector/data/voice

ztybigcat commented 1 year ago

Also I discovered a new bug in the Voice Collector. The server seems to recreate a new sql database each time even though there is an old database available. This results in metadata we collected earlier in older versions of Voice Collector being overwritten. Could you look into the bug if possible?

jindaznb commented 11 months ago

Hi @ztybigcat , Previous this feature only support on For development (Flask),

I updated docker-compose file for production, and it seems is working now, to get the video information and csv, need to submit new recording first

[ec2-user@ip-172-31-49-141 VoiceCollector]$ curl -X GET http://localhost:5000/api/v1/speak/get_csv
{"csv_path":"/tmp/output.csv","message":"CSV file generation complete"}
[ec2-user@ip-172-31-49-141 output]$ pwd
/home/ec2-user/VoiceCollector/data/voice/output
[ec2-user@ip-172-31-49-141 output]$ ls
output.csv  test33.wav

jindaznb commented 11 months ago

Also I discovered a new bug in the Voice Collector. The server seems to recreate a new sql database each time even though there is an old database available. This results in metadata we collected earlier in older versions of Voice Collector being overwritten. Could you look into the bug if possible?

I am looking into this issue, thank you for the reminder

SlangLab-NU / VoiceCollector

32 Make data from databases usable for ASR training #33