SlangLab-NU / VoiceCollector

Apache License 2.0
1 stars 1 forks source link

32 Make data from databases usable for ASR training #33

Closed jindaznb closed 10 months ago

jindaznb commented 11 months ago

Once backend code is running, script will download audios, and output CSV automatically image

ztybigcat commented 11 months ago

Hi Jinda, Could you provide some documentations on how to use this functionality? Especially when the app is deployed using docker in the cloud, how would I execute this script? Thank you 😄

jindaznb commented 11 months ago

Hi Jinda, Could you provide some documentations on how to use this functionality? Especially when the app is deployed using docker in the cloud, how would I execute this script? Thank you 😄

Thank you. To execute the script, follow these steps:

{
   "csv_path": "/server/tmp/output.csv",
   "message": "CSV file generation complete"
}

This indicates that the CSV file has been successfully generated. You can download it using the provided csv_path. If you have any more questions, please feel free to ask! 😄

ztybigcat commented 11 months ago

Hi Jinda, Thanks for your reply. I tried the functionality, and it works! Right now I have two more concerns:

  1. The export dump the files in a folder that is not in a shared volume with the host but in a path within container, this makes it difficult to retrieve the files later on. Could you dump the files in a shared volume under data to make retrieval easier?
  2. Could you document the usage for this in a section in RELEASE.md? When describing the steps, assuming we are ssh ing into the cloud server that are running the server as a docker service (eg. no public exposure of port 5000, call the api via cli, etc).
jindaznb commented 11 months ago

Hi Jinda, Thanks for your reply. I tried the functionality, and it works! Right now I have two more concerns:

  1. The export dump the files in a folder that is not in a shared volume with the host but in a path within container, this makes it difficult to retrieve the files later on. Could you dump the files in a shared volume under data to make retrieval easier?
  2. Could you document the usage for this in a section in RELEASE.md? When describing the steps, assuming we are ssh ing into the cloud server that are running the server as a docker service (eg. no public exposure of port 5000, call the api via cli, etc).

Hi Tianyi, Thank you for the great advice!

The export dump the files in a folder that is not in a shared volume with the host but in a path within container, this makes it difficult to retrieve the files later on. Could you dump the files in a shared volume under data to make retrieval easier?

Could you document the usage for this in a section in RELEASE.md? When describing the steps, assuming we are ssh ing into the cloud server that are running the server as a docker service (eg. no public exposure of port 5000, call the api via cli, etc).

ztybigcat commented 11 months ago

Hi Jinda,

Thanks for the update in documentation! However, upon testing I could not find the csv and wav file in data/voice/output.

[ec2-user@ip voice]$ curl -X GET http://localhost:5000/api/v1/speak/get_csv
{"csv_path":"/tmp/output.csv","message":"CSV file generation complete"}
[ec2-user@ip voice]$ ls -lh
total 0
drwxr-xr-x 19 root root 309 Nov  9 02:05 local-s3
[ec2-user@ip voice]$ pwd
/home/ec2-user/VoiceCollector/data/voice
ztybigcat commented 11 months ago

Also I discovered a new bug in the Voice Collector. The server seems to recreate a new sql database each time even though there is an old database available. This results in metadata we collected earlier in older versions of Voice Collector being overwritten. Could you look into the bug if possible?

jindaznb commented 10 months ago

Hi @ztybigcat , Previous this feature only support on For development (Flask),

I updated docker-compose file for production, and it seems is working now, to get the video information and csv, need to submit new recording first

[ec2-user@ip-172-31-49-141 VoiceCollector]$ curl -X GET http://localhost:5000/api/v1/speak/get_csv
{"csv_path":"/tmp/output.csv","message":"CSV file generation complete"}
[ec2-user@ip-172-31-49-141 output]$ pwd
/home/ec2-user/VoiceCollector/data/voice/output
[ec2-user@ip-172-31-49-141 output]$ ls
output.csv  test33.wav
jindaznb commented 10 months ago

Also I discovered a new bug in the Voice Collector. The server seems to recreate a new sql database each time even though there is an old database available. This results in metadata we collected earlier in older versions of Voice Collector being overwritten. Could you look into the bug if possible?

I am looking into this issue, thank you for the reminder