abhishekkrthakur / tez

Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.
Apache License 2.0
1.16k stars 145 forks source link

Can it work without CUDA #26

Closed hemanthh17 closed 3 years ago

hemanthh17 commented 3 years ago

I am getting error when I executed the code with CPU configuration.

Traceback (most recent call last): File "c:\Users\Hemanth\Desktop\Data Analytics analyticvidya\recommender system\recommender.py", line 88, in train() File "c:\Users\Hemanth\Desktop\Data Analytics analyticvidya\recommender system\recommender.py", line 82, in train model.fit( File "C:\Users\Hemanth\Desktop\Data Analytics analyticvidya\recommender system\venv\lib\site-packages\tez\model\model.py", line 309, in fit self._init_model( File "C:\Users\Hemanth\Desktop\Data Analytics analyticvidya\recommender system\venv\lib\site-packages\tez\model\model.py", line 93, in _init_model self.to(self.device) File "C:\Users\Hemanth\Desktop\Data Analytics analyticvidya\recommender system\venv\lib\site-packages\torch\nn\modules\module.py", line 852, in to return self._apply(convert) File "C:\Users\Hemanth\Desktop\Data Analytics analyticvidya\recommender system\venv\lib\site-packages\torch\nn\modules\module.py", line 530, in _apply module._apply(fn) File "C:\Users\Hemanth\Desktop\Data Analytics analyticvidya\recommender system\venv\lib\site-packages\torch\nn\modules\module.py", line 552, in _apply param_applied = fn(param) File "C:\Users\Hemanth\Desktop\Data Analytics analyticvidya\recommender system\venv\lib\site-packages\torch\nn\modules\module.py", line 850, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) File "C:\Users\Hemanth\Desktop\Data Analytics analyticvidya\recommender system\venv\lib\site-packages\torch\cuda__init__.py", line 166, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

msgi commented 3 years ago

Of course it can.

hemanthh17 commented 3 years ago

Then there shouldn't be an assertion error I suppose. If you have found anything wrong then please let me know

msgi commented 3 years ago

Could you paste some of the codes? Maybe there is a config problem , like the config of the device may be wrong.

abhishekkrthakur commented 3 years ago

as @msgi mentioned, it must work on both cpu and gpu. some code can be useful :)

hemanthh17 commented 3 years ago

OK this is the code and I was watching the recommender system video by @abhishekkrthakur

import pandas as pd
import tez
import torch
from sklearn.model_selection import train_test_split
import torch.nn as nn
from sklearn import metrics,preprocessing
import numpy as np

class MovieDataset:
    def __init__(self,users,movies,ratings):
        self.users=users
        self.movies=movies
        self.ratings=ratings

    def __len__(self):
        return len(self.users)

    def __getitem__(self,item):
        user=self.users[item]
        movie=self.movies[item]
        rating=self.ratings[item]

        return {
            "users":torch.tensor(user,dtype=torch.long),
            "movies":torch.tensor(movie,dtype=torch.long),
            "ratings":torch.tensor(rating,dtype=torch.float)

        }

class RecSysModel(tez.Model):
    def __init__(self,num_users,num_movies):
        super().__init__()
        self.user_embed=nn.Embedding(num_users,32)
        self.movie_embed=nn.Embedding(num_movies,32)
        self.out=nn.Linear(64,1)
        self.step_scheduler_after='epoch'

    def fetch_optimizer(self):
        opt=torch.optim.Adam(self.parameters(),lr=1e-4)
        return opt

    def fetch_scheduler(self):
        sch= torch.optim.lr_scheduler.StepLR(self.optimizer,step_size=3,gamma=0.7)
        return sch

    def monitor_metrics(self,output,rating):
        output=output.detach().cpu().numpy()
        rating=rating.detach().cpu().numpy()
        return {
            'rmse':np.sqrt(metrics.mean_squared_error(rating,output))
        }

    def forward(self,users,movies,ratings=None):
        user_embeds=self.user_embed(users)
        movie_embeds=self.movie_embed(movies)
        output= torch.cat([user_embeds,movie_embeds],dim=1)
        output=self.out(output)

        loss=nn.MSELoss()(output,ratings.view(-1,1))
        calc_metrics =self.monitor_metrics(output,ratings.view(-1,1))
        return output,loss,calc_metrics

def train():
    df= pd.read_csv('train_v2.csv')
    lbl_user=preprocessing.LabelEncoder()
    lbl_movie=preprocessing.LabelEncoder()
    df.user=lbl_user.fit_transform(df.user.values)
    df.movie=lbl_movie.fit_transform(df.movie.values)

    df_train,df_valid=train_test_split(df,test_size=0.2,random_state=42,stratify=df.rating.values)
    train_dataset=MovieDataset(users=df_train.user.values,movies=df_train.movie.values,ratings=df_train.rating.values)
    valid_dataset=MovieDataset(users=df_valid.user.values,movies=df_valid.movie.values,ratings=df_valid.rating.values)
    model=RecSysModel(num_users=len(lbl_user.classes_), num_movies=len(lbl_movie.classes_))
    model.fit(
        train_dataset,valid_dataset,train_bs=1024,
        valid_bs=1024, fp16=True
    )

if __name__=="__main__":
    train()
abhishekkrthakur commented 3 years ago

use:

    model.fit(
        train_dataset,valid_dataset,train_bs=1024,
        valid_bs=1024, fp16=True, device="cpu"
    )
hemanthh17 commented 3 years ago

Oh wow I tried the above snippet before it did not work... Now it is functioning well 👍🏼