chengl7-lab / scape

A package for estimating alternative polyadenylation events from scRNA-seq data.
MIT License
5 stars 1 forks source link

ERROR when running gen_utr_annotation #2

Closed wangzhenzZ closed 3 months ago

wangzhenzZ commented 4 months ago

Hi, thanks for developing this great software. I used the gff3.gz file from Ensembl as input for the scape gen_utr_annotation. However, I got the ERROR:

No available database at Sus_scrofa.Sscrofa11.1.109.chr.gff3.merge.db. Creation of new database from GFF3 will takes several hours.
Creating database using gffutils from gff3 finished in 229.2956510252009 min.
No attribute Name for gene.
Traceback (most recent call last):
  File "/miniconda3/envs/scape_env/bin/scape", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/__init__.py", line 5, in main
    cli(prog_name="scape")
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/utils.py", line 56, in gen_utr_annotation
    _gen_utr_annotation(gff_file, output_dir, res_file_name, gff_merge_strategy)
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/utils.py", line 128, in _gen_utr_annotation
    final_utr_lst += search_utr(g, gene_rna_dict[g], annot_db, output_dir)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/utils.py", line 302, in search_utr
    utr_lst += gene_utr_df[["chrom", "start", "end", "strand", "gene_id", "gene_name"]].values.tolist()
               ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/pandas/core/frame.py", line 4108, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 6200, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 6252, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['strand'] not in index"

I installed scape-apa according to the README.md and SCAPE-toy-example.ipynb. How can I solve this error? Thanks for any reply and suggestions!

ThuyTien1 commented 4 months ago

I cannot reproduce this error using the same gff3 file ensembl.org

What are version of packages in your current conda environment? You can export it using conda list --export > package-list.txt.

Also, there should be a file name check.bed in your output directory resulted by scape gen_utr_annotation. This file contains the last region before failure. Its content should be in the same format Y 41101696 41102402 . 0 +

msaland commented 3 months ago

Hello,

I'm having a similar issue when running on Hs Ensembl v112

Read existing database at /workspace/data/Homo_sapiens.GRCh38.112.chr.gff3.merge.db
Traceback (most recent call last):
  File "/workspace/software/Miniconda3/envs/scape_env/bin/scape", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/__init__.py", line 5, in main
    cli(prog_name="scape")
  File "/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/utils.py", line 56, in gen_utr_annotation
    _gen_utr_annotation(gff_file, output_dir, res_file_name, gff_merge_strategy)
  File "/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/utils.py", line 128, in _gen_utr_annotation
    final_utr_lst += search_utr(g, gene_rna_dict[g], annot_db, output_dir)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/utils.py", line 302, in search_utr
    utr_lst += gene_utr_df[["chrom", "start", "end", "strand", "gene_id", "gene_name"]].values.tolist()
               ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/pandas/core/frame.py", line 4108, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 6200, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 6252, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['strand'] not in index"

I've also attached my current environment scape-apa.txt

The program starts to generate a check.bed file, but it only has 1 line when it fails:

1 69709 71885 . 0 +

ThuyTien1 commented 3 months ago

@wangzhenzZ @msaland The problem is that you are using this version bedtools=2.31.1. Can you please try to reinstall the older version bedtools using the following command conda install bedtools=2.26.0?

msaland commented 3 months ago

Thank you, it's working for me now. Though, I've noticed something interesting: I ran scape gen_utr_annotation on both GRCh38 and GRCm39 v112, and it's running much quicker for the GRCm39 compared to the GRCh38.

And I'm only getting this warning for the GRCm39 file, even if it's quicker:

/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/gffutils/create.py:770: UserWarning: It appears you have a gene feature in your GTF file. You may want to use the disable_infer_genes=True option to speed up database creation warnings.warn( /workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/gffutils/create.py:763: UserWarning: It appears you have a transcript feature in your GTF file. You may want to use the disable_infer_transcripts=True option to speed up database creation warnings.warn( Any ideas what could be causing this?

Files I used: https://ftp.ensembl.org/pub/release-112/gtf/homo_sapiens/Homo_sapiens.GRCh38.112.chr.gtf.gz https://ftp.ensembl.org/pub/release-112/gtf/mus_musculus/Mus_musculus.GRCm39.112.chr.gtf.gz

wangzhenzZ commented 3 months ago

@wangzhenzZ @msaland The problem is that you are using this version bedtools=2.31.1. Can you please try to reinstall the older version bedtools using the following command conda install bedtools=2.26.0?

Thank you for your timely response. I apologize for my delayed feedback, but I wanted to let you know that this method worked for me!

Dongxu-Zheng commented 5 days ago

Thank you, it's working for me now. Though, I've noticed something interesting: I ran scape gen_utr_annotation on both GRCh38 and GRCm39 v112, and it's running much quicker for the GRCm39 compared to the GRCh38.

And I'm only getting this warning for the GRCm39 file, even if it's quicker:

/workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/gffutils/create.py:770: UserWarning: It appears you have a gene feature in your GTF file. You may want to use the disable_infer_genes=True option to speed up database creation warnings.warn( /workspace/software/Miniconda3/envs/scape_env/lib/python3.11/site-packages/gffutils/create.py:763: UserWarning: It appears you have a transcript feature in your GTF file. You may want to use the disable_infer_transcripts=True option to speed up database creation warnings.warn( Any ideas what could be causing this?

Files I used: https://ftp.ensembl.org/pub/release-112/gtf/homo_sapiens/Homo_sapiens.GRCh38.112.chr.gtf.gz https://ftp.ensembl.org/pub/release-112/gtf/mus_musculus/Mus_musculus.GRCm39.112.chr.gtf.gz

Hi did you figure out the "UserWarning"? I also had the same warning. When I ran scape gen_utr_annotation on the GrCh38, it took much longer than expected.