AttributeError when call run_scenicplus

alexwang1001 commented 1 year ago

Hi! I was running scenicplus PBMC 3K tutorial using the singularity container. When I run the following code at the indicated step:

from scenicplus.wrappers.run_scenicplus import run_scenicplus
try:
    run_scenicplus(
        scplus_obj = scplus_obj,
        variable = ['GEX_celltype'],
        species = 'hsapiens',
        assembly = 'hg38',
        tf_file = 'pbmc_tutorial/data/utoronto_human_tfs_v_1.01.txt',
        save_path = os.path.join(work_dir, 'scenicplus'),
        biomart_host = biomart_host,
        upstream = [1000, 150000],
        downstream = [1000, 150000],
        calculate_TF_eGRN_correlation = True,
        calculate_DEGs_DARs = True,
        export_to_loom_file = True,
        export_to_UCSC_file = True,
        path_bedToBigBed = 'pbmc_tutorial',
        n_cpu = 12,
        _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
except Exception as e:
    #in case of failure, still save the object
    dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
    raise(e)

I got this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[25], line 23
     20 except Exception as e:
     21     #in case of failure, still save the object
     22     dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
---> 23     raise(e)

Cell In[25], line 3
      1 from scenicplus.wrappers.run_scenicplus import run_scenicplus
      2 try:
----> 3     run_scenicplus(
      4         scplus_obj = scplus_obj,
      5         variable = ['GEX_celltype'],
      6         species = 'hsapiens',
      7         assembly = 'hg38',
      8         tf_file = 'pbmc_tutorial/data/utoronto_human_tfs_v_1.01.txt',
      9         save_path = os.path.join(work_dir, 'scenicplus'),
     10         biomart_host = biomart_host,
     11         upstream = [1000, 150000],
     12         downstream = [1000, 150000],
     13         calculate_TF_eGRN_correlation = True,
     14         calculate_DEGs_DARs = True,
     15         export_to_loom_file = True,
     16         export_to_UCSC_file = True,
     17         path_bedToBigBed = 'pbmc_tutorial',
     18         n_cpu = 12,
     19         _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
     20 except Exception as e:
     21     #in case of failure, still save the object
     22     dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)

File /opt/venv/lib/python3.8/site-packages/scenicplus/wrappers/run_scenicplus.py:309, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, **kwargs)
    307 if export_to_loom_file is True:
    308     log.info('Exporting to loom file')
--> 309     export_to_loom(scplus_obj,
    310            signature_key = 'Gene_based',
    311            tree_structure = tree_structure,
    312            title =  'Gene based eGRN',
    313            nomenclature = assembly,
    314            out_fname=os.path.join(save_path,'SCENIC+_gene_based.loom'))
    315     export_to_loom(scplus_obj,
    316            signature_key = 'Region_based',
    317            tree_structure = tree_structure,
    318            title =  'Region based eGRN',
    319            nomenclature = assembly,
    320            out_fname=os.path.join(save_path,'SCENIC+_region_based.loom'))
    322 if export_to_UCSC_file is True:

File /opt/venv/lib/python3.8/site-packages/scenicplus/loom.py:174, in export_to_loom(scplus_obj, signature_key, out_fname, eRegulon_metadata_key, auc_key, auc_thr_key, keep_direct_and_extended_if_not_direct, selected_features, selected_cells, cluster_annotation, tree_structure, title, nomenclature)
    170     cv = CountVectorizer(
    171         lowercase=False, token_pattern=r'(?u)\b\w\w+\b:\b\w\w+\b-\b\w\w+\b')
    172 regulon_mat = cv.fit_transform(regulons.values())
    173 regulon_mat = pd.DataFrame(regulon_mat.todense(
--> 174 ), columns=cv.get_feature_names(), index=regulons.keys())
    175 regulon_mat = regulon_mat.reindex(columns=feature_names, fill_value=0).T
    176 if keep_direct_and_extended_if_not_direct is True:

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

Do you know why and could you help me fix it? Thank you! Li

satrapankti commented 1 year ago

Screenshot (68) .get_feature_names_out() instead of get_feature_names()

alexwang1001 commented 1 year ago

.get_feature_names_out() instead of get_feature_names()

That is what I thought as well. Is this a bug for run_scenicplus that needs to be fixed?

SeppeDeWinter commented 1 year ago

Hi both

You're right. get_feature_names got replaced by get_feature_names_out (see: https://github.com/scikit-learn/scikit-learn/pull/18444). I will update the code.

Best,

Seppe

JoGraesslin commented 1 year ago

Hi everyone, I am running into the same issue using Scenic+1.01:

, line 174, in export_to_loom
    ), columns=cv.get_feature_names(), index=regulons.keys())
AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

I have created my pyscenic file using the method described in https://github.com/aertslab/scenicplus/issues/48#issuecomment-1285838142_ as I am trying to make scenicplus run for zebrafish.

Would be very happy for any ideas! Best, Jo

colquittlab commented 1 year ago

I received an identical error as the OP while running run_scenicplus on the PBMC tutorial using scenicplus 1.0.1.dev2+g26677cb.

jflucier commented 1 year ago

a patch seems available in developmeent branch.

@SeppeDeWinter should I switch the scenicplus git repo to developement branch?

Do you plan to merge the fix to master branch?

thank in advance for your help

solvi808 commented 9 months ago

Getting this error as well using scenicplus v. 1.0.1.dev4+ge4bdd9f

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[59], line 23
     20 except Exception as e:
     21     #in case of failure, still save the object
     22     dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
---> 23     raise(e)

Cell In[59], line 3
      1 from scenicplus.wrappers.run_scenicplus import run_scenicplus
      2 try:
----> 3     run_scenicplus(
      4         scplus_obj = scplus_obj,
      5         variable = [ KEY_TO_GROUP_BY_1 ],
      6         species = 'mmusculus', # hsapiens mmusculus
      7         assembly = 'mm10', # hg38 mm10
      8         tf_file = '/media/solvi/WORKSITE1001/refDBs/allTFs_mm.txt',
      9         save_path = os.path.join(work_dir, 'scenicplus'),
     10         biomart_host = biomart_host,
     11         upstream = [1000, 150000],
     12         downstream = [1000, 150000],
     13         calculate_TF_eGRN_correlation = True,
     14         calculate_DEGs_DARs = True,
     15         export_to_loom_file = True,
     16         export_to_UCSC_file = True,
     17         path_bedToBigBed = 'MU4',
     18         n_cpu = NCPUS ,
     19         _temp_dir = os.path.join(tmpDir, 'ray_spill'))
     20 except Exception as e:
     21     #in case of failure, still save the object
     22     dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)

File ~/scenicplus/src/scenicplus/wrappers/run_scenicplus.py:323, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, save_partial, **kwargs)
    321 if export_to_loom_file is True:
    322     log.info('Exporting to loom file')
--> 323     export_to_loom(scplus_obj, 
    324            signature_key = 'Gene_based',
    325            tree_structure = tree_structure,
    326            title =  'Gene based eGRN',
    327            nomenclature = assembly,
    328            out_fname=os.path.join(save_path,'SCENIC+_gene_based.loom'))
    329     export_to_loom(scplus_obj, 
    330            signature_key = 'Region_based',
    331            tree_structure = tree_structure,
    332            title =  'Region based eGRN',
    333            nomenclature = assembly,
    334            out_fname=os.path.join(save_path,'SCENIC+_region_based.loom'))
    336 if export_to_UCSC_file is True:

File ~/scenicplus/src/scenicplus/loom.py:174, in export_to_loom(scplus_obj, signature_key, out_fname, eRegulon_metadata_key, auc_key, auc_thr_key, keep_direct_and_extended_if_not_direct, selected_features, selected_cells, cluster_annotation, tree_structure, title, nomenclature)
    170     cv = CountVectorizer(
    171         lowercase=False, token_pattern=r'(?u)\b\w\w+\b:\b\w\w+\b-\b\w\w+\b')
    172 regulon_mat = cv.fit_transform(regulons.values())
    173 regulon_mat = pd.DataFrame(regulon_mat.todense(
--> 174 ), columns=cv.get_feature_names(), index=regulons.keys())
    175 regulon_mat = regulon_mat.reindex(columns=feature_names, fill_value=0).T
    176 if keep_direct_and_extended_if_not_direct is True:

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

jflucier commented 9 months ago

Hi,

I got around this problem (if I remember well) using the follwing singularity container. Here is the recipe to build container:


# to build: singularity build --force --fakeroot scenicplus.sif scenicplus.def

BootStrap: docker
From: ubuntu:22.04

%setup

%environment
    export PATH=/miniconda3/bin:$PATH
    export PATH=/ucsc.v386:$PATH

%post
    apt-get update && apt-get -y upgrade

    ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime

    # # needed for concoct
    export DEBIAN_FRONTEND=noninteractive
    apt-get -y install \
    build-essential \
    wget \
    git \
    less \
    rsync \
    curl libcurl4 \
    python3 python3-dev python3-pybedtools

    cd /
    wget -c https://repo.anaconda.com/miniconda/Miniconda3-py39_4.11.0-Linux-x86_64.sh
    /bin/bash Miniconda3-py39_4.11.0-Linux-x86_64.sh -bfp /miniconda3
    export PATH=/miniconda3/bin:$PATH

    conda config --file /miniconda3/.condarc --add channels defaults
    conda config --file /miniconda3/.condarc --add channels conda-forge
    conda config --file /miniconda3/.condarc --add channels bioconda
    conda config --file /miniconda3/.condarc --add channels ursky

    echo ". /miniconda3/etc/profile.d/conda.sh" >> $SINGULARITY_ENVIRONMENT
    echo "conda activate scenicplus" >> $SINGULARITY_ENVIRONMENT

    . /miniconda3/etc/profile.d/conda.sh

    conda create --name scenicplus python=3.8
    conda activate scenicplus

    cd /
    mkdir /ucsc.v386
    cd /ucsc.v386
    wget -O bedToBigBed http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed
    chmod a+x /ucsc.v386/*

    cd /
    wget https://github.com/macs3-project/MACS/archive/refs/tags/v2.2.7.1.tar.gz -O MACS.tar.gz
    tar -xvf MACS.tar.gz
    cd MACS-2.2.7.1
    sed -i 's/install_requires = \[f"numpy>={numpy_requires}",\]/install_requires = \[f"numpy{numpy_requires}",\]/' setup.py
    pip install -e .

    conda install --channel conda-forge --channel bioconda bedtools htslib pyrle pybedtools scanpy python-igraph leidenalg

    cd /
    git clone https://github.com/aertslab/scenicplus
    cd scenicplus

    # patch https://github.com/aertslab/scenicplus/commit/821ee7b719afbd1d1e74aadb3ffda9e27165c930
    sed -i 's/get_feature_names/get_feature_names_out/' /scenicplus/src/scenicplus/loom.py
    pip install -e .

    conda install --channel conda-forge numpy=1.23.5 --force
    pip install louvain

Hope this helps!

Umaarasu commented 3 months ago

Hi both

You're right. get_feature_names got replaced by get_feature_names_out (see: scikit-learn/scikit-learn#18444). I will update the code.

Best,

Seppe @SeppeDeWinter Hi, Does this mean we have to just update the scikit-learn?

aertslab / scenicplus

AttributeError when call run_scenicplus #76