Closed alexdemartos closed 5 years ago
Have you checked htop / your cpu usage? Does it show several cores idling? If I'm not mistaken, this repo already uses a FIFO Queue filled by 8 workers (by default), each running on a different process, potentially occupying a different core.
The FIFOQueue is not TF's most up to date idiom, but it does the job here. I've seen an issue in either this repo or a similar repo about moving from the FIFOQueue to the more modern 'Datasets generator' model, but again, that seems unlikely to solve your problem.
Hi, thanks for your message. As far as I know, the FIFOQueue param capacity does not mean number of threads in background reading samples. In fact, I think with the current configuration the FIFOQueue does not start reading samples until the training has run out of them. Adding more threads (like showed above) and enlarging the FIFOQueue size allowed me to get rid of this problem.
My problem was that the library I am using for g2p is very inneficient if you do it sample by sample. I've adapted feeder.py to apply g2p to all the data beforehand.
Hi,
I have a large corpus (>100K sentences), and the bottleneck right now is the loading of the data by the Feeder class. I am not experienced in TF, but I think this can be avoided by running multiple CPU threads while reading data by Feeder. Would it make sense to do something like this in feeder.py?
Thanks!