ageitgey / node-unfluff

Automatically extract body content (and other cool stuff) from an html document
Apache License 2.0
2.15k stars 221 forks source link

ENOENT exception trying to open stopwords-en.txt when running as lambda #99

Open iaincollins opened 5 years ago

iaincollins commented 5 years ago

Hi, thanks for this module, it's great!

I've been migrating more and more things to serverless, and ran into an issue with it.

When I call unfluff() from a lambda it fails and exception is thrown:

{
    "errno": -2,
    "code": "ENOENT",
    "syscall": "open",
    "path": "/var/task/user/api/content/data/stopwords/stopwords-en.txt"
  }

I don't have additional information right now but thought I'd log it as an issue.

This is running on the now.sh platform, and it's possible it's a weird artefact of their build process.

If anyone is using this library on lambda in AWS I'd appreciate knowing that so can close this off and raise it over there instead.

hamedb89 commented 5 years ago

Yeah, I have the same issue. My code is also running on now.sh If you find a solution, I would also be quite interested. 👍

devarpi-zz commented 5 years ago

There is a hack you can apply for this issue. Feed stopwords-en.txt from S3 as an additional parameter and it will work like a charm.