NUStatBioinfo / DegNorm

Normalizing RNA degradation in RNA-seq data
https://nustatbioinfo.github.io/DegNorm/
3 stars 1 forks source link

File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/pandas/core/indexing.py", line 1418, in _has_valid_type (key, self.obj._get_axis_name(axis))) KeyError: 'None of [read_id\ .........Name: pos, dtype: int64] are in the [index]' #25

Closed Yali1989 closed 5 years ago

Yali1989 commented 5 years ago

I am running DegNorm with my .bam and .bai files as input. I got an error: DegNorm (04/25/2019 04:49:41) ---- SAMPLE M72.accepted_hits_sorted, CHR 192 -- begin loading reads from ./sort_bam/M72.accepted_hits_sorted.bam Traceback (most recent call last): File "/newlustre/home/clad/miniconda3/envs/degnorm/bin/degnorm", line 11, in load_entry_point('DegNorm==0.1.4', 'console_scripts', 'degnorm')() File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/main.py", line 138, in main , exon_df=exon_df) File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/reads.py", line 787, in coverage_read_counts for chrom in self.chroms) File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/joblib/parallel.py", line 994, in call self.retrieve() File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/joblib/parallel.py", line 897, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/multiprocessing/pool.py", line 670, in get raise self._value File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 561, in call return self.func(args, **kwargs) File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/joblib/parallel.py", line 261, in call for func, args, kwargs in self.items] File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/joblib/parallel.py", line 261, in for func, args, kwargs in self.items] File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/reads.py", line 697, in chromosome_coverage_read_counts reads_df['gene'] = chrom_gene_df.loc[reads_df.pos].gene.values File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/pandas/core/indexing.py", line 1328, in getitem return self._getitem_axis(key, axis=0) File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/pandas/core/indexing.py", line 1541, in _getitem_axis return self._getitem_iterable(key, axis=axis) File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/pandas/core/indexing.py", line 1081, in _getitem_iterable self._has_valid_type(key, axis) File "/newlustre/home/clad/miniconda3/envs/degnorm/lib/python3.6/site-packages/pandas/core/indexing.py", line 1418, in _has_valid_type (key, self.obj._get_axis_name(axis))) KeyError: 'None of [read_id\n14 1517\n15 1517\n16 1575\n17 1575\n18 1620\n19 1620\n20 1669\n21 1674\n22 1695\n23 1697\n24 1734\n25 1754\n26 1756\n27 1756\n28 1757\n29 1865\n30 1865\n31 1868\n32 1870\n33 1907\n34 1944\n35 1944\n36 1944\n37 2017\n38 2032\n39 2032\n40 2032\nName: pos, dtype: int64] are in the [index]' I don't know how this error came about. Any one can help me correct this? I would appreciate it very much!

ffineis commented 5 years ago

Looks like an issue with your data - "CHR 192" in the log is saying you're parsing reads for chromosome 192 (seems fishy, esp if your data is from human samples). Are you using a .gtf file? Can't use .gff files.

Yali1989 commented 5 years ago

Looks like an issue with your data - "CHR 192" in the log is saying you're parsing reads for chromosome 192 (seems fishy, esp if your data is from human samples). Are you using a .gtf file? Can't use .gff files.

Thank you for your reply. My data is from an unpublish plant genome. I use a .gtf file, and the gtf file is convert from .gff3 with gffread command from cufflinks.

ffineis commented 5 years ago

Hey apologies for the delay. I've pushed a hotfix in attempt to remedy the issue. You've found the weakest point in the DegNorm codebase - a very tenuous join between genes and reads. Try reinstalling from the hotfix/read_gene_join branch and re-running? If this works I'll merge. I actually don't have any examples of .bam files where the prior merge code was failing. Curious to see your data once published.

Yali1989 commented 5 years ago

Hey apologies for the delay. I've pushed a hotfix in attempt to remedy the issue. You've found the weakest point in the DegNorm codebase - a very tenuous join between genes and reads. Try reinstalling from the hotfix/read_gene_join branch and re-running? If this works I'll merge. I actually don't have any examples of .bam files where the prior merge code was failing. Curious to see your data once published.

Thank you for your help. I wrote a python program to transfer gff3 file to gtf file, it worked. But now I got another error: DegNorm (05/10/2019 09:13:20) ---- CHR 10 -- obtained 2833 coverage matrices. DegNorm (05/10/2019 09:13:20) ---- CHR 51: no chromosome coverage files available. DegNorm (05/10/2019 09:13:20) ---- CHR 55: no chromosome coverage files available. DegNorm (05/10/2019 09:13:20) ---- CHR 57: begin coverage matrix processing. Using 1 gene splits for memory efficiency. Traceback (most recent call last): File "/home/wangyali/miniconda3/bin/degnorm", line 11, in load_entry_point('DegNorm==0.1.4', 'console_scripts', 'degnorm')() File "/home/wangyali/miniconda3/lib/python3.7/site-packages/DegNorm-0.1.4-py3.7.egg/degnorm/main.py", line 164, in main , verbose=True)rix progress: 0%| | 0/100 [00:00<?, ?%/s] File "/home/wangyali/miniconda3/lib/python3.7/site-packages/DegNorm-0.1.4-py3.7.egg/degnorm/reads_coverage_merge.py", line 398, in merge_coverage verbose=verbose) for chrom in chroms) File "/home/wangyali/miniconda3/lib/python3.7/site-packages/joblib/parallel.py", line 934, in call self.retrieve() File "/home/wangyali/miniconda3/lib/python3.7/site-packages/joblib/parallel.py", line 833, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/home/wangyali/miniconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value File "/home/wangyali/miniconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/home/wangyali/miniconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 567, in call return self.func(args, kwargs) File "/home/wangyali/miniconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in call for func, args, kwargs in self.items] File "/home/wangyali/miniconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in for func, args, kwargs in self.items] File "/home/wangyali/miniconda3/lib/python3.7/site-packages/DegNorm-0.1.4-py3.7.egg/degnorm/reads_coverage_merge.py", line 299, in merge_chrom_coverage cov_vec_sp = sparse.load_npz(npz_file).transpose()[start_pos:end_pos, :] File "/home/wangyali/miniconda3/lib/python3.7/site-packages/scipy/sparse/_matrix_io.py", line 131, in load_npz with np.load(file, PICKLE_KWARGS) as loaded: File "/home/wangyali/miniconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 422, in load fid = open(os_fspath(file), "rb") FileNotFoundError: [Errno 2] No such file or directory: 'degnorm_output/degnorm_05092019_062706/s-M11-1_sorted/chrom_coverage_s-M11-1_sorted_100.npz'

Indeed, the chrom_coverage_s-M11-1_sorted_100.npz file is missing. Can you help me again?

ffineis commented 5 years ago

Hey @Yali1989 - I suppose this would happen if an entire chromosome had zero reads - there would be no saved chromosome coverage vector in the case a whole chromosome had no fragments. Admittedly DegNorm was developed mostly using human experiments, so apologies for these headaches as we expand to new genomes. Let me push something this weekend that might help.

ffineis commented 5 years ago

I think the latest commit should do the trick.

Yali1989 commented 5 years ago

I think the latest commit should do the trick.

Thank you for your efforts to handle chroms with no coverage. The latest commit solved this problem. This kind of error not happened anymore. Now, a new error happened. DegNorm (05/18/2019 09:41:11) ---- CHR 12 -- obtained 4102 coverage matrices. DegNorm (05/18/2019 09:41:12) ---- Joining overlapping genes' coverage vectors into coverage matrices. Traceback (most recent call last): File "/home/wangyali/miniconda3/envs/degnorm/bin/degnorm", line 11, in load_entry_point('DegNorm==0.1.4', 'console_scripts', 'degnorm')() File "/home/wangyali/miniconda3/envs/degnorm/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/main.py", line 163, in main , verbose=True) File "/home/wangyali/miniconda3/envs/degnorm/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/reads_coverage_merge.py", line 440, in merge_coverage save_dir = os.path.join(output_dir, chrom) File "/home/wangyali/miniconda3/envs/degnorm/lib/python3.6/posixpath.py", line 94, in join genericpath._check_arg_types('join', a, *p) File "/home/wangyali/miniconda3/envs/degnorm/lib/python3.6/genericpath.py", line 149, in _check_arg_types (funcname, s.class.name)) from None TypeError: join() argument must be str or bytes, not 'int64'

I am sorry to bother you again.

ffineis commented 5 years ago

Hey @Yali1989 you're not bothering anyone! Happy to help, sorry this has been buggy. This is an easy fix - my apologies, I hadn't safely wrapped non-string chromosome names in str(), had assumed everyone's .gtf file named chromosomes like "chr2", "chrX", etc. Pushing a fix, thanks for your patience.

Yali1989 commented 5 years ago

Hi @ffineis ineis, thank you for your reply. Our chromosome only named by numbers. Can I add "chr" in front of the previous chromosome names, and try to run again? the My gtf file like following. 1 EVM gene 4431 5310 . - . gene_id "evm.TU.1.1";gene_name "EVM%20prediction%201.1"; 1 EVM transcript 4431 5310 . - . gene_id "evm.TU.1.1";transcript_id "evm.model.1.1"; 1 EVM exon 4936 5310 . - . gene_id "evm.TU.1.1";transcript_id "evm.model.1.1";exon_number "1"; 1 EVM exon 4431 4529 . - . gene_id "evm.TU.1.1";transcript_id "evm.model.1.1";exon_number "2"; 1 EVM gene 5755 7381 . - . gene_id "evm.TU.1.2";gene_name "EVM%20prediction%201.2";

ffineis commented 5 years ago

@Yali1989 no need - this should be fixed in da41df1, just reinstall and rerun

Yali1989 commented 5 years ago

@Yali1989 no need - this should be fixed in da41df1, just reinstall and rerun

Finally, I finished the normalization on my test bam files. Thank you for your help.