SmerkyG / gptcore

Fast modular code to create and train cutting edge LLMs
Apache License 2.0
62 stars 9 forks source link

Issue Training on Google Colab #11

Open opooladz opened 4 months ago

opooladz commented 4 months ago

I am getting the following error when trying the train a model following the readme.

Traceback (most recent call last): File "/usr/lib/python3.10/pydoc.py", line 443, in safeimport module = import(path) File "/content/gptcore/dataset/init.py", line 35, in class PipedDatasetWrapper(typing.Generic[T_co], torch.utils.data.datapipes.datapipe.IterDataPipe[T_co]): File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_typing.py", line 373, in new return super().new(cls, name, bases, namespace, kwargs) # type: ignore[call-overload] File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_typing.py", line 260, in new return super().new(cls, name, bases, namespace, kwargs) # type: ignore[call-overload] File "/usr/lib/python3.10/abc.py", line 106, in new cls = super().new(mcls, name, bases, namespace, **kwargs) TypeError: Cannot create a consistent method resolution order (MRO) for bases Generic, IterDataPipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/content/gptcore/util/config.py", line 479, in process located = locate(fullid, Missing) File "/content/gptcore/util/locate.py", line 57, in locate nextmodule = pydoc.safeimport('.'.join(parts[:n+1]), forceload) File "/usr/lib/python3.10/pydoc.py", line 458, in safeimport raise ErrorDuringImport(path, sys.exc_info()) pydoc.ErrorDuringImport: problem in dataset - TypeError: Cannot create a consistent method resolution order (MRO) for bases Generic, IterDataPipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/content/gptcore/cli.py", line 176, in cli() File "/content/gptcore/cli.py", line 89, in cli disk_cfg = util.config.eval_first_expr(disk_cfg_str, macros) File "/content/gptcore/util/config.py", line 630, in eval_first_expr return ConfigParser().eval_first_expr(unparsed_input, incoming_macros) File "/content/gptcore/util/config.py", line 399, in eval_first_expr return self.process(node.value) File "/content/gptcore/util/config.py", line 552, in process rv = self.create_factory(node, node.args, node.keywords, immediate=True) File "/content/gptcore/util/config.py", line 618, in create_factory positional_placeholders_count, placeholders, args, kwargs = self.process_args_and_keywords(node_args=node_args, node_keywords=node_keywords) File "/content/gptcore/util/config.py", line 588, in process_args_and_keywords value = self.process(kw.value) File "/content/gptcore/util/config.py", line 522, in process rv = self.create_factory(node, node.args, node.keywords, immediate=False) File "/content/gptcore/util/config.py", line 618, in create_factory positional_placeholders_count, placeholders, args, kwargs = self.process_args_and_keywords(node_args=node_args, node_keywords=node_keywords) File "/content/gptcore/util/config.py", line 588, in process_args_and_keywords value = self.process(kw.value) File "/content/gptcore/util/config.py", line 522, in process rv = self.create_factory(node, node.args, node.keywords, immediate=False) File "/content/gptcore/util/config.py", line 607, in create_factory func_ident = self.process(func_node) File "/content/gptcore/util/config.py", line 560, in process raise ConfigParseError(node, self.unparsed_input, msg="Internal exception during configuration parsing " + str(e)) util.config.ConfigParseError: Internal exception during configuration parsing problem in dataset - TypeError: Cannot create a consistent method resolution order (MRO) for bases Generic, IterDataPipe at line 70, col 36 datamodule_factory=lambda: dataset.DM( ^^^^^^^^^^^

Here is a google colab where its reproducable is.

hypnopump commented 4 months ago

This can be solved if you use the base pytorch docker image: pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

opooladz commented 4 months ago

havent been able to get this working with docker if you can take a stab at it that would be great. as of now udocker is starting but cant see the GPU. not sure if barebones docker will give more control but yea. any help would be greatly appreciated. thanks

hypnopump commented 4 months ago

What do you mean not been able to get it to work? You mean you couldn't get docker on colab? or even with docker it didn't work? Bc docker works for me both in cloud gpus and different local machines; haven't tried google colab though

opooladz commented 4 months ago

I was not able to get docker working properly on Google colab. I adjusted the notebook above. I am not too familiar with docker. If ur able to get it working on a colab notebook that would be helpful.