Closed massimo1980 closed 5 years ago
Can you share your modifications ?
----- file "run.sh": `#!/bin/sh
set -xe
export PATH=$(dirname "$0"):$PATH
env
checks.sh
export TMP=/mnt/tmp export TEMP=/mnt/tmp
/mnt/extracted/deepspeech --model /mnt/models/output_graph.pbmm --alphabet /mnt/models/alphabet.txt --lm /mnt/lm/lm.binary --trie /mnt/lm/trie --audio /mnt/galatea_01_barrili_f000040.wav`
/mnt/extracted/deepspeech
Like, where is this coming from ?
of course i didn't put the file there :joy: i have done a "find / -type f -name deepspeech" in the run.sh file ...and it comes out i have not checked the scripts, but i think that it is builded during the "container building" EDIT: actually i found in file "build_lm.sh" that: curl -sSL https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu/artifacts/public/native_client.tar.xz | pixz -d | tar -xf -
so i immagine that the bin file comes from here
Tomorrow, if i can, i will try french model +1
Can you share your Italian model ?
of course yes! what service do you prefear ?
can i have the t-shirt anyway ?! :joy: ( just kidding )
Could you please try to merge with latest code from my repo ?
It's possible that being in the wrong directory at export time may fail things. Also, it'd be easier for testing and debugging.
of course yes! what service do you prefear ?
As you wish
can i have the t-shirt anyway ?! joy ( just kidding )
Sorry, we don't have any.
Could you please try to merge with latest code from my repo ?
It's possible that being in the wrong directory at export time may fail things. Also, it'd be easier for testing and debugging.
yeah i'm tryin to modify the scripts to bash in the container and manually git pull
of course yes! what service do you prefear ?
As you wish
i have uploaded the files on my google drive account ( these files were automatically generated from the scripts ): https://drive.google.com/drive/folders/1Oql_7ZeA3clVWRESMVlDGVMpQ9buspyk?usp=sharing let me know if you can get in
can i have the t-shirt anyway ?! joy ( just kidding ) Sorry, we don't have any.
..never a joy..
$ ./deepspeech --model output_graph.pbmm --alphabet alphabet.txt --lm lm.binary --trie trie --audio ../test-alex.fr.wav -t
TensorFlow: v1.14.0-16-g3b4ce37
DeepSpeech: v0.6.0-alpha.8-6-gb888058
2019-10-04 20:40:13.715411: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Specified model file version (0) is incompatible with minimum version supported by this client (2). See https://github.com/mozilla/DeepSpeech/#model-compatibility for more information
Could not create model.
So there's something screwed up in your process. Please update to my latest version and provide diff so we can investigate.
i have just re-cloned master branch of deepspeech re-done the training using lm.binary trie and alphabet.txt created from the container's scripts manually converted the graph from pb to pbmm and run deepspeech with that model....same error
i have just re-cloned master branch of deepspeech re-done the training using lm.binary trie and alphabet.txt created from the container's scripts manually converted the graph from pb to pbmm and run deepspeech with that model....same error
You need to instrument DeepSpeech.py
around https://github.com/mozilla/DeepSpeech/blob/master/DeepSpeech.py#L800 and see what is the version being used here.
i have just re-cloned master branch of deepspeech re-done the training using lm.binary trie and alphabet.txt created from the container's scripts manually converted the graph from pb to pbmm and run deepspeech with that model....same error
You need to instrument
DeepSpeech.py
around https://github.com/mozilla/DeepSpeech/blob/master/DeepSpeech.py#L800 and see what is the version being used here.
Just done... it appear 2 so i think that there's something wrong with my "convert_graphdef_memmapped_format" i can't download it from taskcluster, 404 not found if i provide you ( on my google drive ) the "output_graph.pb" can you check if it's wrong or not ?
EDIT: Gotcha! in file Dockerfile.train:
RUN TASKCLUSTER_SCHEME="https://index.taskcluster.net/v1/task/project.deepspeech.tensorflow.pip.%(branch_name)s.%(arch_string)s/artifacts/public/%(artifact_name)s" python util/taskcluster.py \ --target="$(pwd)" \ --artifact="convert_graphdef_memmapped_format" \ --branch="r1.13" && chmod +x convert_graphdef_memmapped_format
changing branch to r.1.14 ...
deepspeech --model /mnt/models_bis/output_graph.pbmm --alphabet /mnt/models/alphabet.txt --lm /mnt/lm/lm.binary --trie /mnt/lm/trie --audio /mnt/galatea_01_barrili_f000040.wav Loading model from file /mnt/models_bis/output_graph.pbmm TensorFlow: v1.14.0-16-g3b4ce37 DeepSpeech: v0.6.0-alpha.8-0-gf0e9541 successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-04 21:44:00.514303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6773 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) Loaded model in 0.26s. Loading language model from files /mnt/lm/lm.binary /mnt/lm/trie Loaded language model in 0.00153s. Running inference. en vander in el torero so tuna montagne di coro Inference took 1.095s for 5.620s audio file.
if i provide you ( on my google drive ) the "output_graph.pb" can you check if it's wrong or not ?
Well, you can directly test it yourself :)
so i think that there's something wrong with my "convert_graphdef_memmapped_format"
It should have been downloaded, there should not be any change.
@massimo1980 Do you have the docker build
logs ? You should have one line with a util/taskcluster.py
call, it'd like to see it and the download it generates.
@massimo1980 Do you have the
docker build
logs ? You should have one line with autil/taskcluster.py
call, it'd like to see it and the download it generates.
you mean the line i have pasted in the previous answer ?
@massimo1980 Do you have the
docker build
logs ? You should have one line with autil/taskcluster.py
call, it'd like to see it and the download it generates.you mean the line i have pasted in the previous answer ?
Haha, you edited before my reply :). Well, at least it makes sense now, and it advocates for you to follow more closely.
I'm open to any PR that might make your life easier, do not forget :)
@mone27 In Dockerfile.train:
--branch="r1.13" && chmod +x convert_graphdef_memmapped_format
if you use this version, you will get this error:
TensorFlow: v1.14.0-16-g3b4ce37
DeepSpeech: v0.6.0-alpha.8-6-gb888058
2019-10-04 20:40:13.715411: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Specified model file version (0) is incompatible with minimum version supported by this client (2).
See https://github.com/mozilla/DeepSpeech/#model-compatibility for more information
Could not create model.`
Change branch version from "r1.13" to "r1.14" to avoid this error.
Is there already a testing training model ? actually my training is at "best_dev-45217" but the results are really poor. The testing phrases are not so good during the evaluation step. ( i'm training with the same parameters that are in Dockerfile.train ) ....any advice ?
If you join on our Developers telegram group we are talking here, check with @mozitabot on telegram :-)
Change branch version from "r1.13" to "r1.14" to avoid this error.
This was changed in my code a long time ago.
actually my training is at "best_dev-45217" but the results are really poor. The testing phrases are not so good during the evaluation step. ( i'm training with the same parameters that are in Dockerfile.train ) ....any advice ?
Those parameters are for french, and they depend on they dataset (variety, quality) as well as the amount of data. So you need to do your own evaluations.
Also, if you have not enough data, it's not going to be magic :)
Hi there, ( have to write in English or i can write in Italian ? )
i have tested the docker instance and i've found this errors:
wget https://lingualibre.fr/datasets/Q385-ita-Italian.zip -O /mnt/source/lingua_libre_Q385-ita-Italian_train.zip
change "/mnt/source/" into "/mnt/sources/"
sed -i s/#//g '/mnt/extracted/data/*test.csv'
sed: can't read /mnt/extracted/data/*test.csv: No such file or directory
my workaround was to specify the directories like this:
sed -i 's/#//g' /mnt/extracted/data/cv-it/clips/*test.csv
sed -i 's/#//g' /mnt/extracted/data/cv-it/clips/*train.csv
sed -i 's/#//g' /mnt/extracted/data/cv-it/clips/*dev.csv
sed -i 's/#//g' /mnt/extracted/data/lingualibre/*test.csv
sed -i 's/#//g' /mnt/extracted/data/lingualibre/*train.csv
sed -i 's/#//g' /mnt/extracted/data/lingualibre/*dev.csv
+ rm /mnt/lm/lm.arpa
+ '[' '!' -f /mnt/lm/trie ']'
+ curl -sSL https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.ba56407376f1e1109be33ac87bcb6eb9709b18be.cpu/artifacts/public/native_client.tar.xz
+ pixz -d
+ tar -xf -
can not seek in input: Illegal seek
Not an XZ file
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
browsing that url gives: ResourceNotFound
P.S. Everytime pixz return:
can not seek in input: Illegal seek
hope this is only a warning P.P.S. i've tried to format this thread as best as possible but seems i can't...sorry if it is too chaoticHope to help in some way
Regards Massimo