Closed microbiaki closed 8 years ago
Thank you for your interest in our software and reporting this issue. I've looked into it and found that there was a problem with an underlying Python library in the annotation step. The problem should be fixed now. The long error message is due to the same error occurring multiple times, when Traitar annotates each input sample. I've made a new release on gitHub https://github.com/hzi-bifo/traitar/releases/tag/v1.0.4, you can download and install with
pip install traitar-1.0.4.tar.gz --user -U --no-deps
alternatively you can update from pypi
pip install traitar --user -U --no-deps
Once updated you can continue from the Pfam annotation step. If you want to save some time you can comment lines 205, 225 and 226 and unintend lines 227 and 228 in the traitar main script
is_recompute = self.check_dir(a_dir)
if is_recompute:
self.execute_commands(hmmer_commands)
self.execute_commands(fae_commands) self.execute_commands([domtblout2gene_generic])
Running
which traitar
will give you the path. This will skip running the time-intensive hmmsearch command, which should have succeeded in your case; but make sure to uncomment and re-indent for future runs. Make sure that you remove the phenotype directory when prompted, too. Please let me know if this solved your problem or if further problems occurred and again thanks a lot for reporting this critical bug.
Thank you of the prompt response.
I get another error this time: Error: File existence/permissions problem in trying to open HMM file /home/aaron/hmms/Pfam-A.hmm. HMM file /home/aaron/hmms/Pfam-A.hmm not found (nor an .h3m binary of it)
I do not have an aaron folder in my home directory but I have downloaded Pfam exactly as instructed (traitar pfam '/home/maria/Pfam') and my Pfam-A.hmm is there. Do I have to provide the path for it?
Best regards, Maria
Dear Maria,
Thanks for following up on this. This should be easy to fix. You need to run
traitar pfam --local /home/maria/Pfam
as your config file got overwritten upon update. I'm sorry I did not anticipate that. This will just reset the path, not download the HMM file again. Unfortunately my own config file was packaged into the Traitar source distribution, which is why it's looking for the aaron folder.
Best regards,
Aaron
The prodigal step finished fine but soon after I got the message about the Pfam annotation, I got a very long error: running Pfam annotation with hmmer. This step can take a while. A rough estimate for sequential Pfam annotation of genome samples of ~3 Mbs is 10 min per genome. Traceback (most recent call last): File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 3162, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 3446, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2808, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2219, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2917, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2156, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 1826, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
Traceback (most recent call last):
File "/usr/local/bin/hmmer2filtered_best.py", line 64, in
filtered_df = apply_thresholds(args.infile_f, args.eval_thresh, args.bit_score_thresh, args.out_filt_f, args.out_excl_f)
File "/usr/local/bin/hmmer2filtered_best.py", line 24, in apply_thresholds
m_eval = m.loc[(m.iloc[:,12] <= eval_threshold) & (m.iloc[:, 13] >= bit_score_thresh), :]
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1026, in getitem
return self._getitem_tuple(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 617, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1143, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1040, in _getbool_axis
raise self._exception(detail)
KeyError: AssertionError("Cannot create BlockManager._ref_locs because block [FloatBlock: [E-value, score, bias, c-Evalue, i-Evalue, score, bias, acc], 8 x 2299, dtype: float64] with duplicate items [Index([u'target name', u'accession', u'tlen', u'query name', u'accession', u'qlen', u'E-value', u'score', u'bias', u'#', u'of', u'c-Evalue', u'i-Evalue', u'score', u'bias', u'from', u'to', u'from', u'to', u'from', u'to', u'acc', u'description of target'], dtype='object')] does not have _ref_locs set",)
ls: cannot access /home/maria/Desktop/Traitar_test_20160425/pfam_annotation/*_filtered_best.dat: No such file or directory
running phenotype prediction
Traceback (most recent call last):
File "/usr/local/bin/predict.py", line 110, in
annotate_and_predict((pt1, pt2), tarfile.open(args.model_tar, mode = "r:gz"), args.annotation_matrix,args.pfam_pts_mapping_f, args.out_dir, args.voters)
File "/usr/local/bin/predict.py", line 88, in annotate_and_predict
aggr_dfs = aggregate(pred_df, k)
File "/usr/local/bin/predict.py", line 31, in aggregate
maj_pred_dfs[0].iloc[:,i / k] = pred_df.iloc[:, i: i + k].apply(filter_pred, axis = 1, is_majority = True, k = k)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 98, in setitem
self._setitem_with_indexer(indexer, value)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 416, in _setitem_with_indexer
value = self._align_frame(indexer, value)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 587, in _align_frame
raise ValueError('Incompatible indexer with DataFrame')
ValueError: Incompatible indexer with DataFrame
Traceback (most recent call last):
File "/usr/local/bin/predict.py", line 110, in
annotate_and_predict((pt1, pt2), tarfile.open(args.model_tar, mode = "r:gz"), args.annotation_matrix,args.pfam_pts_mapping_f, args.out_dir, args.voters)
File "/usr/local/bin/predict.py", line 88, in annotate_and_predict
aggr_dfs = aggregate(pred_df, k)
File "/usr/local/bin/predict.py", line 31, in aggregate
maj_pred_dfs[0].iloc[:,i / k] = pred_df.iloc[:, i: i + k].apply(filter_pred, axis = 1, is_majority = True, k = k)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 98, in setitem
self._setitem_with_indexer(indexer, value)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 416, in _setitem_with_indexer
value = self._align_frame(indexer, value)
File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 587, in _align_frame
raise ValueError('Incompatible indexer with DataFrame')
ValueError: Incompatible indexer with DataFrame
Traceback (most recent call last):
File "/usr/local/bin/merge_preds.py", line 79, in
comb_preds(args.phypat_dir, args.phypat_GGL_dir, args.out_dir, args.voters)
File "/usr/local/bin/merge_preds.py", line 19, in comb_preds
m1_scores = ps.read_csv("%s/predictions_majority-vote_mean-score.txt"%phypat_dir, index_col = 0, sep = "\t")
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 420, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 218, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 502, in init
self._make_engine(self.engine)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 610, in _make_engine
self._engine = CParserWrapper(self.f, self.options)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 972, in init
self._reader = _parser.TextReader(src, kwds)
File "parser.pyx", line 330, in pandas.parser.TextReader.cinit (pandas/parser.c:3200)
File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5559)
IOError: File /home/maria/Desktop/Traitar_test_20160425/phenotype_prediction/phypat/predictions_majority-vote_mean-score.txt does not exist
running feature track generation
Traceback (most recent call last):
File "/usr/local/bin/traitar", line 329, in
args.func(args)
File "/usr/local/bin/traitar", line 19, in phenolyze
p.run(args.mode)
File "/usr/local/bin/traitar", line 164, in run
self.run_feature_track_generation(self.s2f.loc[:,"sample_name"], mode)
File "/usr/local/bin/traitar", line 249, in run_feature_track_generation
phypat_preds = ps.read_csv(os.path.join(self.phypat_dir, "predictions_majority-vote.txt"), index_col = 0, sep = "\t")
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 420, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 218, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 502, in init
self._make_engine(self.engine)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 610, in _make_engine
self._engine = CParserWrapper(self.f, self.options)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 972, in init
self._reader = _parser.TextReader(src, kwds)
File "parser.pyx", line 330, in pandas.parser.TextReader.cinit (pandas/parser.c:3200)
File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5559)
IOError: File /home/maria/Desktop/Traitar_test_20160425/phenotype_prediction/phypat/predictions_majority-vote.txt does not exist