chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

doc.to_bag_of_terms return empty items in docker #208

Open jeronimo13 opened 6 years ago

jeronimo13 commented 6 years ago

Hi, I have function

def bag_of_term(text):
  doc = textacy.Doc(text)
  bot = doc.to_bag_of_terms(ngrams=(2, 3, 4), named_entities=True, weighting='count', as_strings=True)
  print(bot.items())
  return sorted(bot.items(), key=lambda x: x[1], reverse=True)

which works normally and returns ngrams on my machine

I use this function behind flask REST API

But when I run it inside Docker image and use it via REST endpoint I've got dict_items([('', 520)]) which I consider empty.

This is my docker looks like

FROM floydhub/textacy

## Install dependencies
RUN pip --no-cache-dir install \
        flask

COPY . /app
WORKDIR /app
ENTRYPOINT ["python"]
EXPOSE 8000
CMD ["app.py"]

I've tried several Dockerfile setups(including writing my own) and end up with written above, but all of them have this issue with empty doc.to_bag_of_terms

Any ideas how to run textacy inside docker?