dieterich-lab / circtools

circtools: a modular, python-based framework for circRNA-related tools that unifies several functionalities in a single, command line driven software.
http://circ.tools
GNU General Public License v3.0
25 stars 20 forks source link

Enrichment Module Errors #72

Closed udube closed 5 years ago

udube commented 5 years ago

Hello,

I receive the following errors when running the circtools enrichment module using a gencode GTF file.

Input command:

circtools enrich -c circCoordinates-b predictions.hg19.bed-hg38 -a gencode.v26.primary_assembly.annotation.gtf -g sizes.genome -i 10 -p 20 -P 1 -T 1

Warnings and error:

Processed intersections for iteration 6
***** WARNING: File /tmp/pybedtools.a1akbyv9.tmp has inconsistent naming convention for record:
1       11869   14409   DDX11L1 0       +

***** WARNING: File /tmp/pybedtools.a1akbyv9.tmp has inconsistent naming convention for record:
1       11869   14409   DDX11L1 0       +

Processed intersections for iteration 4
Traceback (most recent call last):
  File "/root/.local/bin/circtools", line 18, in <module>
    import circtools
  File "/root/.local/lib/python3.5/site-packages/circtools/__init__.py", line 2, in <module>
    main()
  File "/root/.local/lib/python3.5/site-packages/circtools/circtools.py", line 31, in main
    CircTools()
  File "/root/.local/lib/python3.5/site-packages/circtools/circtools.py", line 76, in __init__
    getattr(self, args.command)()
  File "/root/.local/lib/python3.5/site-packages/circtools/circtools.py", line 204, in enrich
    enrich.run_module()
  File "/root/.local/lib/python3.5/site-packages/circtools/enrichment/enrichment_check.py", line 205, in run_module
    self.process_intersection(self.results[0][1], linear_start=True)
  File "/root/.local/lib/python3.5/site-packages/circtools/enrichment/enrichment_check.py", line 692, in process_intersection
    str(tmp_data["feature_length"]) + "_" +
KeyError: 'feature_length'

Input command:

circtools enrich -c circCoordinates-b predictions.hg19.bed-hg38 -a gencode.v26.primary_assembly.annotation.gtf -g sizes.genome -i 10 -p 20 -P 1 -T 1 -I exon

This command runs through, but the .csv file output only has a header (output_10_2019_01_23__20_43.zip)

Thanks,

tjakobi commented 5 years ago

Dear @udube,

would you have some more program output for me, i.e. the full log of the run? In the beginning there should be some statistics about the BED input files. Additionally, if possible, the predictions.hg19.bed-hg38 would be useful (I assume that's where the warning are coming from). The bedtools warnings may point out some issues while reading the bed files, thus yielding the error you got.

Cheers, Tobias

udube commented 5 years ago

Sure, please find the files you requested attached.

Output.zip

EDIT: Including logs from a run with no output and a run with the error.

Thanks!

tjakobi commented 5 years ago

Thank you for providing the files. I'll take a look at the issue - I suspect something in conjunction with the chromosome names may be not correctly handled. The test data usually had "1" instead of "chr1", so that may be the problem.

tjakobi commented 5 years ago

Dear @udube,

I tried to reproduce the issue with mock CircCoordinates files and do not get errors while using the same GTF and BED file than in you example. Would it be possible to also provide the CircCoordinates file?

Cheers, Tobias

udube commented 5 years ago

Thank you for continuing to look into this issue. Please find the CircCoordinates file as well as all output produced, including a text file with the program's output to screen.

ExampleCircCoordinates.zip

tjakobi commented 5 years ago

Dear @udube,

Thank you for providing the files. I used the NPHP4 file as input and still cannot reproduce the error. However my output is also empty. Could you please check your Python3 environment for me and run pip3 list to see what package versions you have installed right now. Specifically pybedtools would be interesting (I'm running pybedtools 0.8.0).

Cheers, Tobias

udube commented 5 years ago

To clarify, I only receive the error when I do not include "-I exon". When I include "-I exon" I have the empty output.

Please find the information you requested below:

biopython (1.73) circtools (1.1.0.7) HTSeq (0.11.2) numpy (1.16.0) pandas (0.23.4) patsy (0.5.1) Pillow (5.4.1) pip (9.0.1) pybedtools (0.8.0) pysam (0.15.2) python-dateutil (2.7.5) pytz (2018.9) reportlab (3.5.13) scipy (1.2.0) setuptools (33.1.1) six (1.12.0) statsmodels (0.9.0) wheel (0.32.3)

tjakobi commented 5 years ago

Dear @udube,

I could trace back the error to some old code that I have not used for months. Basically, I changed the default to "gene" mode for -I in the latest commit, meaning that shuffling will take place over all annotated gene regions (which more or less mirrors not specifying -I in earlier versions). Please update circtools to the latest version from the repository to get the updated code.

That should fix your problem. In a related note, the warning of bedtools is generated by the genome.sizes file that contains "chr1" etc as IDs whereas circtools internally converts everything to "1". I will add an auto conversion of the genome.sizes file in a later release to also address that issue. for now you may use sed and therefore get rid of that warning.

Cheers, Tobias

udube commented 5 years ago

Thank you. I am able to run the command without any errors. I appreciate your perseverance.