aweimann / traitar

GNU General Public License v3.0
21 stars 25 forks source link

Error during execution #48

Closed microbiaki closed 8 years ago

microbiaki commented 8 years ago

The prodigal step finished fine but soon after I got the message about the Pfam annotation, I got a very long error: running Pfam annotation with hmmer. This step can take a while. A rough estimate for sequential Pfam annotation of genome samples of ~3 Mbs is 10 min per genome. Traceback (most recent call last): File "/usr/local/bin/hmmer2filtered_best.py", line 64, in filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f) File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :] File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem return self._getitem_tuple(key) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple retval = getattr(retval, self.name)._getitem_axis(key, axis=i) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis return self._getbool_axis(key, axis=axis) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis raise self._exception(detail) KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 3162, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",) Traceback (most recent call last): File "/usr/local/bin/hmmer2filtered_best.py", line 64, in filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f) File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :] File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem return self._getitem_tuple(key) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple retval = getattr(retval, self.name)._getitem_axis(key, axis=i) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis return self._getbool_axis(key, axis=axis) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis raise self._exception(detail) KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 3446, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",) Traceback (most recent call last): File "/usr/local/bin/hmmer2filtered_best.py", line 64, in filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f) File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :] File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem return self._getitem_tuple(key) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple retval = getattr(retval, self.name)._getitem_axis(key, axis=i) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis return self._getbool_axis(key, axis=axis) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis raise self._exception(detail) KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2808, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",) Traceback (most recent call last): File "/usr/local/bin/hmmer2filtered_best.py", line 64, in filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f) File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :] File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem return self._getitem_tuple(key) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple retval = getattr(retval, self.name)._getitem_axis(key, axis=i) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis return self._getbool_axis(key, axis=axis) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis raise self._exception(detail) KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2219, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",) Traceback (most recent call last): File "/usr/local/bin/hmmer2filtered_best.py", line 64, in filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f) File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :] File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem return self._getitem_tuple(key) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple retval = getattr(retval, self.name)._getitem_axis(key, axis=i) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis return self._getbool_axis(key, axis=axis) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis raise self._exception(detail) KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2917, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",) Traceback (most recent call last): File "/usr/local/bin/hmmer2filtered_best.py", line 64, in filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f) File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :] File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem return self._getitem_tuple(key) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple retval = getattr(retval, self.name)._getitem_axis(key, axis=i) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis return self._getbool_axis(key, axis=axis) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis raise self._exception(detail) KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2156, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",) Traceback (most recent call last): File "/usr/local/bin/hmmer2filtered_best.py", line 64, in filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f) File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :] File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem return self._getitem_tuple(key) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple retval = getattr(retval, self.name)._getitem_axis(key, axis=i) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis return self._getbool_axis(key, axis=axis) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis raise self._exception(detail) KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 1826, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",) Traceback (most recent call last): File "/usr/local/bin/hmmer2filtered_best.py", line 64, in filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f) File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :] File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem return self._getitem_tuple(key) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple retval = getattr(retval, self.name)._getitem_axis(key, axis=i) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis return self._getbool_axis(key, axis=axis) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis raise self._exception(detail) KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2299, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",) ls: cannot access /home/maria/Desktop/Traitar_test_20160425/pfam_annotation/*_filtered_best.dat: No such file or directory running phenotype prediction Traceback (most recent call last): File "/usr/local/bin/predict.py", line 110, in annotate_and_predict((pt1, pt2), tarfile.open(args.model_tar, mode = "r:gz"), args.annotation_matrix,args.pfam_pts_mapping_f, args.out_dir, args.voters) File "/usr/local/bin/predict.py", line 88, in annotate_and_predict aggr_dfs = aggregate(pred_df, k) File "/usr/local/bin/predict.py", line 31, in aggregate maj_pred_dfs[0].iloc[:,i / k] = pred_df.iloc[:, i: i + k].apply(filter_pred, axis = 1, is_majority = True, k = k) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 98, in setitem self._setitem_with_indexer(indexer, value) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 416, in _setitem_with_indexer value = self._align_frame(indexer, value) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 587, in _align_frame raise ValueError('Incompatible indexer with DataFrame') ValueError: Incompatible indexer with DataFrame Traceback (most recent call last): File "/usr/local/bin/predict.py", line 110, in annotate_and_predict((pt1, pt2), tarfile.open(args.model_tar, mode = "r:gz"), args.annotation_matrix,args.pfam_pts_mapping_f, args.out_dir, args.voters) File "/usr/local/bin/predict.py", line 88, in annotate_and_predict aggr_dfs = aggregate(pred_df, k) File "/usr/local/bin/predict.py", line 31, in aggregate maj_pred_dfs[0].iloc[:,i / k] = pred_df.iloc[:, i: i + k].apply(filter_pred, axis = 1, is_majority = True, k = k) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 98, in setitem self._setitem_with_indexer(indexer, value) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 416, in _setitem_with_indexer value = self._align_frame(indexer, value) File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 587, in _align_frame raise ValueError('Incompatible indexer with DataFrame') ValueError: Incompatible indexer with DataFrame Traceback (most recent call last): File "/usr/local/bin/merge_preds.py", line 79, in comb_preds(args.phypat_dir, args.phypat_GGL_dir, args.out_dir, args.voters) File "/usr/local/bin/merge_preds.py", line 19, in comb_preds m1_scores = ps.read_csv("%s/predictions_majority-vote_mean-score.txt"%phypat_dir, index_col = 0, sep = "\t") File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 420, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 218, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 502, in init self._make_engine(self.engine) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 610, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 972, in init self._reader = _parser.TextReader(src, kwds) File "parser.pyx", line 330, in pandas.parser.TextReader.cinit (pandas/parser.c:3200) File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5559) IOError: File /home/maria/Desktop/Traitar_test_20160425/phenotype_prediction/phypat/predictions_majority-vote_mean-score.txt does not exist running feature track generation Traceback (most recent call last): File "/usr/local/bin/traitar", line 329, in args.func(args) File "/usr/local/bin/traitar", line 19, in phenolyze p.run(args.mode) File "/usr/local/bin/traitar", line 164, in run self.run_feature_track_generation(self.s2f.loc[:,"sample_name"], mode) File "/usr/local/bin/traitar", line 249, in run_feature_track_generation phypat_preds = ps.read_csv(os.path.join(self.phypat_dir, "predictions_majority-vote.txt"), index_col = 0, sep = "\t") File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 420, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 218, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 502, in init self._make_engine(self.engine) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 610, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 972, in init self._reader = _parser.TextReader(src, kwds) File "parser.pyx", line 330, in pandas.parser.TextReader.cinit (pandas/parser.c:3200) File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5559) IOError: File /home/maria/Desktop/Traitar_test_20160425/phenotype_prediction/phypat/predictions_majority-vote.txt does not exist

aweimann commented 8 years ago

Thank you for your interest in our software and reporting this issue. I've looked into it and found that there was a problem with an underlying Python library in the annotation step. The problem should be fixed now. The long error message is due to the same error occurring multiple times, when Traitar annotates each input sample. I've made a new release on gitHub https://github.com/hzi-bifo/traitar/releases/tag/v1.0.4, you can download and install with

pip install traitar-1.0.4.tar.gz --user -U --no-deps

alternatively you can update from pypi

pip install traitar --user -U --no-deps

Once updated you can continue from the Pfam annotation step. If you want to save some time you can comment lines 205, 225 and 226 and unintend lines 227 and 228 in the traitar main script

is_recompute = self.check_dir(a_dir)

if is_recompute:

self.execute_commands(hmmer_commands)

self.execute_commands(fae_commands) self.execute_commands([domtblout2gene_generic])

Running

which traitar

will give you the path. This will skip running the time-intensive hmmsearch command, which should have succeeded in your case; but make sure to uncomment and re-indent for future runs. Make sure that you remove the phenotype directory when prompted, too. Please let me know if this solved your problem or if further problems occurred and again thanks a lot for reporting this critical bug.

microbiaki commented 8 years ago

Thank you of the prompt response.

I get another error this time: Error: File existence/permissions problem in trying to open HMM file /home/aaron/hmms/Pfam-A.hmm. HMM file /home/aaron/hmms/Pfam-A.hmm not found (nor an .h3m binary of it)

I do not have an aaron folder in my home directory but I have downloaded Pfam exactly as instructed (traitar pfam '/home/maria/Pfam') and my Pfam-A.hmm is there. Do I have to provide the path for it?

Best regards, Maria

aweimann commented 8 years ago

Dear Maria,

Thanks for following up on this. This should be easy to fix. You need to run

traitar pfam --local /home/maria/Pfam

as your config file got overwritten upon update. I'm sorry I did not anticipate that. This will just reset the path, not download the HMM file again. Unfortunately my own config file was packaged into the Traitar source distribution, which is why it's looking for the aaron folder.

Best regards,

Aaron