YangLab / CIRCexplorer2

circular RNA analysis toolset
http://circexplorer2.readthedocs.org/
Other
76 stars 42 forks source link

Python issues with parser and fetch_ucsc #49

Closed ryandikdan closed 3 years ago

ryandikdan commented 3 years ago

Hello, I was unable to fetch the appropriate files, but I did just look in the python script and downloaded the ref gtf manually. I guess this is the same as using the Table browser. More importantly, using a gtf from the table browser, or the one from that fetch_ucsc script, I get this error when running the parser:

  File "/home/ryan/miniconda2/envs/clear/bin/CIRCexplorer2", line 10, in <module>
    sys.exit(main())
  File "/home/ryan/miniconda2/envs/clear/lib/python2.7/site-packages/circ2/command_parse.py", line 51, in main
    command=command_log, name='annotate')
  File "/home/ryan/miniconda2/envs/clear/lib/python2.7/site-packages/circ2/helper.py", line 38, in wrapper
    fn(*args)
  File "/home/ryan/miniconda2/envs/clear/lib/python2.7/site-packages/circ2/annotate.py", line 38, in annotate
    secondary_flag=options['--low-confidence'])
  File "/home/ryan/miniconda2/envs/clear/lib/python2.7/site-packages/circ2/annotate.py", line 50, in annotate_fusion
    genes, novel_genes, gene_info, chrom_info = parse_ref(ref_f, 1)
  File "/home/ryan/miniconda2/envs/clear/lib/python2.7/site-packages/circ2/parser.py", line 78, in parse_ref
    starts = [int(x) for x in line.split()[9].rstrip(',').split(',')]
ValueError: invalid literal for int() with base 10: '"NM_001375803";'

It seems that the gtf isn't formatted properly or something since it's expecting a number and is getting a RefSeq number. Perhaps it isn't splitting properly cuz I don't have the right python packages? I did generate a fresh conda environment in python 2.7 and install via conda so the dependencies should be fine.

ryandikdan commented 3 years ago

Silly me! I was using the gtf instead of the txt file that the fetch_ucsc.py script generated! With that said I had to use the GENCODE known genes list instead of refseq since the refseq annotations wouldn't download through the script. I'll close this issue, but for anyone having the same issue just use the kg or GENCODE annotations! (I think they're more comprehensive anyways, at least from my cursory searching.)