NUStatBioinfo / DegNorm

Normalizing RNA degradation in RNA-seq data
https://nustatbioinfo.github.io/DegNorm/
3 stars 1 forks source link

'dict' object has no attribute 'as_dict' #31

Closed artagmaz closed 5 years ago

artagmaz commented 5 years ago

Hello, When I run my degnorm script

#!/bin/bash
#script 
export DATADIR=~/Desktop/Pv/result.degnorm

degnorm --bam-files $DATADIR/bams/PvD0.1.sorted.bam $DATADIR/bams/PvD48.1.sorted.bam --bai-files $DATADIR/bams/PvD0.1.sorted.bam.bai $DATADIR/bams/PvD48.1.sorted.bam.bai -g $DATADIR/stringtie-transdecoder-2.gff3 -o $DATADIR/degnorm_output

I've got this traceback with error:

DegNorm (08/09/2019 10:52:18) ---- DegNorm output directory -- /Users/artagmaz/Desktop/Pv/result.degnorm/degnorm_output/degnorm_08092019_105218
Traceback (most recent call last):
  File "/Users/artagmaz/miniconda2/envs/degnorm/bin/degnorm", line 11, in <module>
    load_entry_point('DegNorm==0.1.4', 'console_scripts', 'degnorm')()
  File "/Users/artagmaz/miniconda2/envs/degnorm/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/__main__.py", line 67, in main
    , index_file=args.bai_files[idx]).header
  File "/Users/artagmaz/miniconda2/envs/degnorm/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/reads.py", line 130, in __init__
    self.get_header()
  File "/Users/artagmaz/miniconda2/envs/degnorm/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/reads.py", line 158, in get_header
    header_dict = bam_file.header.as_dict()['SQ']
AttributeError: 'dict' object has no attribute 'as_dict'

With what it could be connected and how I can fix it?

artagmaz commented 5 years ago

I did all steps from the beginning again and now it works, sorry that bothered you.

artagmaz commented 5 years ago

okay, now it writes this mistake

 File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/scipy/sparse/csr.py", line 443, in check_bounds
    " %d <= %d" % (i0, num, i1, num, i0, i1))
IndexError: index out of bounds: 0 <= 12605 <= 18232, 0 <= 18233 <= 18232, 12605 <= 18233
ffineis commented 5 years ago

Hello, thanks for using DegNorm.

To diagnose what's going on, I'll need more information.

  1. Please post the entire error traceback. The traceback message you've sent is only a snippet of the error message that isn't directly related DegNorm, only to a method called from another package. Currently I can't see where in the pipeline we've encountered the problem.

  2. Please verify that your .bam files are 0-indexed (i.e. that the first position on a chromosome that a read could cover would be position 0) and your .gtf file is 1-indexed (i.e. that a gene's position on a chromosome can start at 1, not 0). It's possible that your reads are 1-index or your .gtf file is 0-indexed.

If I can't diagnose from the full traceback and you've verified 2., I'll have to debug using your .bam, .bai, and .gtf files, which can be sent to me in a Dropbox or Box.com directory link.

artagmaz commented 5 years ago

Is that what do you mean under entire error traceback ?

Traceback (most recent call last):
  File "/Users/artagmaz/miniconda2/envs/degnorm2/bin/degnorm", line 11, in <module>
    load_entry_point('DegNorm==0.1.4', 'console_scripts', 'degnorm')()
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/__main__.py", line 163, in main
DegNorm (08/09/2019 01:33:25) ---- CHR HiC_scaffold_6 -- obtained 2 coverage matrices.
DegNorm (08/09/2019 01:33:25) ---- CHR HiC_scaffold_89 -- obtained 1 coverage matrices.
    , verbose=True)
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/reads_coverage_merge.py", line 410, in merge_coverage
    verbose=verbose) for chrom in chroms)
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/joblib/parallel.py", line 994, in __call__
    self.retrieve()
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/joblib/parallel.py", line 897, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 561, in __call__
    return self.func(*args, **kwargs)
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/joblib/parallel.py", line 261, in __call__
    for func, args, kwargs in self.items]
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/joblib/parallel.py", line 261, in <listcomp>
    for func, args, kwargs in self.items]
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/reads_coverage_merge.py", line 301, in merge_chrom_coverage
    cov_vec_sp = sparse.load_npz(npz_file).transpose()[start_pos:end_pos, :]
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/scipy/sparse/csc.py", line 169, in __getitem__
    return self.T[col, row].T
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/scipy/sparse/csr.py", line 304, in __getitem__
    return self._get_submatrix(row, col)
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/scipy/sparse/csr.py", line 448, in _get_submatrix
    check_bounds(j0, j1, N)
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/scipy/sparse/csr.py", line 443, in check_bounds
    " %d <= %d" % (i0, num, i1, num, i0, i1))
IndexError: index out of bounds: 0 <= 12605 <= 18232, 0 <= 18233 <= 18232, 12605 <= 18233

By the way, I checked input files, bam files are 0-indexed and gtf files are 1-indexed.

ffineis commented 5 years ago

Thanks for the full traceback.

I've made a small edit I'm hoping does the trick. Please reinstall degnorm from the hotfix/issues branch and re-run degnorm.

Reinstall like this (starting from the DegNorm directory in the virtual environment you've been using):

git fetch
git checkout hotfix/issues

rm -rf DegNorm.egg-info build dist
pip uninstall degnorm
./install

Let me know if you continue running into issues, getting degnorm to work properly with gene scaffolding has been tricky.

artagmaz commented 5 years ago

New error, but it creates a folder for each chromosome and scaffold with one .pkl file. Can I check somehow that it worked fine?

Traceback (most recent call last):
  File "/Users/artagmaz/miniconda2/envs/degnorm2/bin/degnorm", line 11, in <module>
    load_entry_point('DegNorm==0.1.4', 'console_scripts', 'degnorm')()
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/__main__.py", line 185, in main
    read_count_df = read_count_df.loc[genes]
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/pandas/core/indexing.py", line 1500, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/pandas/core/indexing.py", line 1902, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/pandas/core/indexing.py", line 1205, in _getitem_iterable
    raise_missing=False)
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/pandas/core/indexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)
  File "/Users/artagmaz/miniconda2/envs/degnorm2/lib/python3.6/site-packages/pandas/core/indexing.py", line 1246, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)))
KeyError: "None of [Index(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10',\n       ...\n       '10436', '10437', '10438', '10439', '10442', '10443', '10444', '10445',\n       '10446', '10447'],\n      dtype='object', name='gene', length=10436)] are in the [index]"
ffineis commented 5 years ago

Unfortunately the traceback indicates that there was a problem subsetting gene read counts to the genes of interest (although you're right, read counts and coverage calcs finished successfully). Looks like it could be some issue with the gene naming possibly in the .gtf, not sure.

Can you share your input .bam, .bai, and .gtf files with me via Dropbox? It can be a subset of the .bam files if you're working with many. This is how I've debugged an issue stemming from gene scaffolding in the past. Contact info is listed at https://nustatbioinfo.github.io/DegNorm/about/contact/ (I'm Frank).

artagmaz commented 5 years ago

Update: Previous my GTF file I converted from GFF3 using some script or tool, yesterday I converted it by my own script and with this one degnorm finished successfully! But I have a question which field from annotation is most important for Degnorm? For example in the column with attributes (column 9), is it important the order od values or it selected based on the names? For example, my annotation looks like:

chr_1   transdecoder    gene    5745114 5753672 .   +   .   transcript_id "MSTRG.1000";name "ORF"

chr_1   transdecoder    mRNA    5745114 5753672 .   +   .   transcript_id "MSTRG.1000.1.p1";gene_id "MSTRG.1000";name "ORF"

chr_1   transdecoder    five_prime_UTR  5745114 5745329 .   +   .   transcript_id "MSTRG.1000.1.p1.utr5p1";gene_id "MSTRG.1000.1.p1"

Moreover, I am more interested in reads covered ncRNA regions and I have annotation in gff3 format (ofcourse I will transform it to gtf), but I need to know is there any filtration/sorting/special requirements by type of feature (column 3) or any other columns?

ffineis commented 5 years ago

Nice! Glad it ran.

I should put more details on the expected format and attributes in the .gtf file. You need a gene_id or gene_name attribute so that we can uniquely identify references to genes. Here is an example of a .gtf file I use in testing.

I should add this .gtf-related information in a FAQ.

And yes, there is filtering in the third column feature; we're filtering to exon features - DegNorm was developed in order to find exon coverage in RNA-seq experiments. If you need ncRNA features, hands down the easiest way to run it with degnorm is to replace where it says ncRNA with exon (a hack). If this is like a common bioinformatics task, coverage on features other than exon, let me know and I can make this filtering parameterizable.

ffineis commented 5 years ago

I'm going to close this issue out, but feel free to open a feature request or contact me with further questions.