gmihaila / ml_things

This is where I put things I find useful that speed up my work with Machine Learning. Ever looked in your old projects to reuse those cool functions you created before? Well, this repo is designed to be a Python Library of functions I created in my previous project that can be reused. I also share some Notebooks Tutorials and Python Code Snippets.
https://gmihaila.github.io
Apache License 2.0
245 stars 61 forks source link

pre-train #18

Closed shainaraza closed 2 years ago

shainaraza commented 2 years ago

Hi @gmihaila thanks for this splendid library. Just have a quick question regarding pre-training from scratch, is it possible using a single system. I don't have much data to pre-train. Any suggestions. thanks

gmihaila commented 2 years ago

@shainaraza what do you mean by single system?

shainaraza commented 2 years ago

yes @gmihaila I have Colab just and want to train model on 100000 of publications data (papers and their text).

gmihaila commented 2 years ago

@shainaraza Yes, you should be able to use a Colab to pre-train from scratch. Make sure to use a GPU. I did it myself several times. As for the data size, I'm not sure how many lines of text will 100,000 publications take but I'm pretty sure it should be able to handle it as long as it can fit in ram.