Closed piyushmishra12 closed 5 years ago
Is there any file format we should follow? Can we gzip the encoding file to save space?
I was thinking of just writing the program for encoding the necessary text and then exporting the data to a .csv or .tsv file. And then we could access that file during training instead of doing everything in the same python file. I am most comfortable with .csv files but you can go ahead and do whatever you think is fine. I'll review it once you make a pull request. Fair enough?
Yes. Sounds good.
So should I consider you handling this issue then?
Yes. I have done it basically. I am happy to hear your review on it and to make any necessary changes. I have saved it as .gz files.
I have merged your pull request. Please see the changes I mentioned there, raise a separate issue, and try to work on that. Once you've raise the issue, I'll close this one.
It would be tedious to always run the same file and encode the text. Rather, a better way would be to store encoded data and henceforth access that whenever needed.