bowang-lab / scGPT

https://scgpt.readthedocs.io/en/latest/
MIT License
1.05k stars 207 forks source link

Help with the installation on colab #241

Open bitcometz opened 3 months ago

bitcometz commented 3 months ago

hello, thanks for this great tool !!!

I follow the installation tutorial and got some errors:

import os
import sys

if "google.colab" in sys.modules:
    print("Running on Google Colab")
    print("Installing dependencies...")
    !pip install -U scgpt
    # the optional dependency of flash-attion is skipped on colab
    !pip install wandb louvain

    # NOTE: May need to restart runtime after the installation

    print("Downloading data and model ckpt...")
    !pip install -q -U gdown
    import gdown

import scvi
adata = scvi.data.pbmc_dataset()

Errors:

INFO     File data/gene_info_pbmc.csv already downloaded                                                           
INFO     File data/pbmc_metadata.pickle already downloaded                                                         
INFO     File data/pbmc8k/filtered_gene_bc_matrices.tar.gz already downloaded                                      
INFO     Extracting tar file                                                                                       
INFO     Removing extracted data at data/pbmc8k/filtered_gene_bc_matrices                                          
INFO     File data/pbmc4k/filtered_gene_bc_matrices.tar.gz already downloaded                                      
INFO     Extracting tar file                                                                                       
INFO     Removing extracted data at data/pbmc4k/filtered_gene_bc_matrices                                          
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-16-2b688d4bab92>](https://localhost:8080/#) in <cell line: 1>()
      1 if dataset_name == "PBMC_10K":
----> 2     adata = scvi.data.pbmc_dataset()  # 11990 × 3346
      3     ori_batch_col = "batch"
      4     adata.obs["celltype"] = adata.obs["str_labels"].astype("category")
      5     adata.var = adata.var.set_index("gene_symbols")

2 frames
[/usr/local/lib/python3.10/dist-packages/numpy/__init__.py](https://localhost:8080/#) in __getattr__(attr)
    322 
    323         if attr in __former_attrs__:
--> 324             raise AttributeError(__former_attrs__[attr])
    325 
    326         if attr == 'testing':

AttributeError: module 'numpy' has no attribute 'str'.
`np.str` was a deprecated alias for the builtin `str`. To avoid this error in existing code, use `str` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.str_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Could you help with this problem? Thanks !!!

Best

kocemir commented 3 months ago

For which task is this ? I may try to help you with the annotation task.

First you should upload the datasets and pretrained models to a folder in your drive. Then, mount the drive to colab for path inserting. Then follow the below:

!pip install scgpt ( if you dont want to use flash-attn)

!pip install scgpt "flash-attn<1.0.5" (takes time)

!pip install wandb

import copy import gc import json import os from pathlib import Path import shutil import sys import time import traceback from typing import List, Tuple, Dict, Union, Optional import warnings import pandas as pd

import pickle import torch from anndata import AnnData import scanpy as sc import scvi import seaborn as sns import numpy as np import wandb from scipy.sparse import issparse import matplotlib.pyplot as plt from torch import nn from torch.nn import functional as F from torch.utils.data import Dataset, DataLoader from sklearn.model_selection import train_test_split from sklearn.metrics import adjusted_rand_score, normalized_mutual_info_score from torchtext.vocab import Vocab from torchtext._torchtext import ( Vocab as VocabPybind, ) from sklearn.metrics import confusion_matrix

sys.path.insert(0, "../") import scgpt as scg from scgpt.model import TransformerModel, AdversarialDiscriminator from scgpt.tokenizer import tokenize_and_pad_batch, random_mask_value from scgpt.loss import ( masked_mse_loss, masked_relative_error, criterion_neg_log_bernoulli, ) from scgpt.tokenizer.gene_tokenizer import GeneVocab from scgpt.preprocess import Preprocessor from scgpt import SubsetsBatchSampler from scgpt.utils import set_seed, category_str2int, eval_scib_metrics

sc.set_figure_params(figsize=(6, 6)) os.environ["KMP_WARNINGS"] = "off" warnings.filterwarnings('ignore')

This will probably work

bitcometz commented 3 months ago

@kocemir , thanks for your help!

Yes, I want to do the annotation task !

You are right that adding "flash-attn<1.0.5" takes really long time !!! I am using free colab GPU resources that I cannot finish the installation with adding flash-attn.

Best