Closed BradCoffield closed 4 years ago
I've looked into it a bit more and I'm pretty sure that if it's possible it will require Docker.
Digging...
Currently working on Dockerfile for grab-site. Successfully compiled, needs a little enchancements and documentation (and ofc testing).
I know there exist some PR, but they are old and seems that are orphaned by authors.
Dockerfile should be reviewed at least one time per few years (it uses latest commit to build, but fixed python version as well as fixed alpine version), it's WIP for now :slightly_smiling_face:
I've used this docker container before successfully: https://hub.docker.com/r/slang800/grab-site/
@brandongalbraith yes, it's based on it a little
@raspher Looking forward to your updated Dockerfile 😄
Python 3.8 cannot be used, cannot pass URL due to error "TypeError: required field "posonlyargs" missing from arguments" (python 3.8 only)
Beta version of dockerfile
FROM python:3.7-alpine3.12
WORKDIR /app
RUN apk add --no-cache --update build-base libffi-dev libxml2-dev libxslt-dev re2-dev pkgconfig git libressl-dev musl-dev && \
git clone --depth=1 --branch=master https://github.com/ArchiveTeam/grab-site.git && \
cd grab-site && \
pip3 install --upgrade pip setuptools && \
pip3 install --no-binary lxml --upgrade ./ && \
apk del --purge build-base libffi-dev pkgconfig git musl-dev && \
rm -R /root/.cache
VOLUME ["/data"]
WORKDIR /data
EXPOSE 29000/tcp
CMD ["python", "/app/grab-site/gs-server"]
This is beautiful. Can't wait to try it all out!
Does anyone have any insights on the feasibility of setting up an instance of grab-site on a service like Heroku? I'd like to do so in order to take advantage of automating scrapes that I need to happen monthly and weekly. Also, would like to use cloud functions to listen and take the output and save the files to AWS.
I'm researching but it seems like maybe installing grab-site on Heroku isn't possible and I was hoping to get input before potentially wasting a bunch more time. Thanks!