Memory Error: While Preprocessing Tokeinzer for Urdu Language

Hello , I want to create a tokenizer for urdu language and I have used this

(tpu_data) D:>python IndicBERT/tokenization/build_tokenizer.py --input "D:\IndicBERT\ur.txt" --output "D:\IndicBERT\output" --vocab_size 250000

After this: as per instructions: I used this command:

(tpu_data) D:>python IndicBERT/process_data/create_mlm_data.py --input_file="D:\IndicBERT\ur.txt" --output_file="D:\IndicBERT\output" --input_file_type=monolingual --tokenizer="D:\IndicBERT\output\config.json"

This happened multiple times,

AS this whole architecture is not using GPU.
Here are my specs,

Processor: i7-9700k : 3.6GHz Ram : 32GB GPU: Nvidia GTX 1660ti (6gb)

I actually have two questions:

How to resolve this memory error? Is there a way to use GPU? as this preprocessing is not utilizing the GPU or should I use Google Colab?

Secondly: As I only require a tokenizer for urdu language, After Preprocess Data , Will I have the tokenizer json file?

AI4Bharat / IndicBERT

Memory Error: While Preprocessing Tokeinzer for Urdu Language #4