aweimann / traitar

GNU General Public License v3.0
21 stars 25 forks source link

AttributeError: 'DataFrame' object has no attribute 'sort_values' #65

Open sminot opened 7 years ago

sminot commented 7 years ago

Using the docker image for aweimann/traitar:release (38eaf28de0a1), I got the following error:

Traceback (most recent call last):
  File "/usr/local/bin/hmmer2filtered_best", line 15, in <module>
    aggregate_domain_hits(filtered_df, args.out_best_f)
  File "/usr/local/lib/python2.7/dist-packages/traitar/hmmer2filtered_best.py", line 52, in aggregate_domain_hits
    filtered_df.sort_values(by = ["target name", "query name"], inplace = True)
  File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1815, in __getattr__
    (type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'sort_values'

FIX: My guess is that this is a problem with the version of Pandas in the image. So I updated pandas in the container to pandas==0.20.3 with pip. I also updated numexpr to 2.4.6 for compatibility with that version of pandas.

RESULT: After making those two updates, traitar ran with no errors.

SUGGESTION: Pin all the various versions of python packages that work (pip freeze > requirements.txt) and then use that list to install from in the Dockerfile (pip install -r requirements.txt) to avoid a confounding effect of the most recent version at build time.

abremges commented 7 years ago

Thank you, @sminot, greatly appreciated! 👍

aweimann commented 7 years ago

@sminot sorry for my late reply I was on holiday. Thank you for the great suggestions! :+1:

lhor commented 7 years ago

After following the installation I'm having the exact same issue reported by @sminot. Updating pandas and numexpr didn't solve the problem.

lhor commented 7 years ago

@sminot Would you please provide the output for pip freeze of your working image? thanks!

palomo11 commented 6 years ago

Hi, Any news about this?

After installing traitar from virtual env and installing the older version of pandas:

virtualenv traitar-env 
source /home/name/traitar-env/bin/activate

PATH=$PATH:/home/name/traitar-env/bin/

source ~/.bashrc
pip install pandas==0.19
(traitar-env) python2.7
Python 2.7.14 

>>> import pandas
>>> print('The pandas version is {}.'.format(pandas.__version__))
The pandas version is 0.19.0.

I got this error:

/home/name/scripts/traitar-env/bin/hmmer2filtered_best.py:50: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  filtered_df.sort(columns = ["target name", "query name"], inplace = True)

/bin/sh: -c: line 0: syntax error near unexpected token `('

/bin/sh: -c: line 0: `domtblout2gene_generic.py traitar_bins/annotation/pfam/summary.dat  <(ls traitar_bins/annotation/pfam/*_filtered_best.dat) /home/name/traitar-env/lib/python2.7/site-packages/traitar/data/models/phypat.tar.gz'

Traceback (most recent call last):
  File "/home/name/traitar-env/bin/predict.py", line 92, in <module>
    annotate_and_predict(pt_models, args.annotation_matrix,  args.out_dir, args.voters) 
  File "/home/name/traitar-env/bin/predict.py", line 67, in annotate_and_predict
    m = ps.read_csv(summary_f, sep="\t", index_col = 0)
  File "/home/name/traitar-env/lib/python2.7/site-packages/pandas/io/parsers.py", line 645, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/name/traitar-env/lib/python2.7/site-packages/pandas/io/parsers.py", line 388, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/name/traitar-env/lib/python2.7/site-packages/pandas/io/parsers.py", line 729, in __init__
    self._make_engine(self.engine)
  File "/home/name/traitar-env/lib/python2.7/site-packages/pandas/io/parsers.py", line 922, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/name/traitar-env/lib/python2.7/site-packages/pandas/io/parsers.py", line 1389, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4019)
  File "pandas/parser.pyx", line 665, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:7967)
IOError: File traitar_bins/annotation/pfam/summary.dat does not exist

Any idea on what is going on wrong or how to solve it?

aweimann commented 6 years ago

Thank you for the reminder. Your error indicates that the bash process substitution is not working. This is because /bin/sh is being used instead of /bin/bash although this is hard coded. I'm not really sure why this would happen. Can you please make sure /bin/bash is the standard bash?

Many thanks,

Aaron

fungs commented 6 years ago

@aweimann: Circumventing the BASH pipe syntax (in case you want to get rid of the requirement) you could read the list via stdin.

aweimann commented 6 years ago

Thanks @fungs I will look into that. @palomo11 sorry this is taking a bit longer and thanks for your continued interest!

palomo11 commented 6 years ago

Hi @aweimann Finally it worked! I followed your suggestions and after changing to /bin/bash, it went fine.

nick-youngblut commented 5 years ago

I'm not sure if this repo is maintained anymore, but pandas==0.20.3 is super old, and numexpr==2.4.6 isn't even available on conda-forge anymore. Are there are any plans on updating traitar?

Should we consider this package no longer maintained?

I could fork this package and try to update the pandas code (and maybe add some unit tests too), but @aweimann will you accept the pull request?

aweimann commented 5 years ago

Sorry I haven't had much time to look after the repo but will take a look in the next few days.

nick-youngblut commented 5 years ago

Thanks for the quick response! Let me know if I can help. Traitar seems to still be the state-of-the-art, given that the code from Farrell et al., 2018 seems to no longer be available (and the paper was never published in journal, as far as I can tell).