ArchiveTeam / warrior-dockerfile

A Dockerfile for the ArchiveTeam Warrior
307 stars 57 forks source link

Ability to choose workdir for temporary storage #83

Open raybooysen opened 9 months ago

raybooysen commented 9 months ago

The Archive Warrior running writes gigabytes of data before uploading. I would love a way to specify via env variable the location of the temporary storage to a file system or storage device that I'd prefer

TheTechRobo commented 9 months ago

Assuming you're using docker, you can use its -v parameter to bridge the current workdir to wherever you want on the host system.

raybooysen commented 9 months ago

I've been trying with some variants. /home/warrior, /home/warrior/data, /home/warrior/projects. All cause various amounts of errors.

Unsure which I should be using

viniciushsantana commented 9 months ago

I've been considering this issue as well. It seems that the working directories /home/warrior/projects and /home/warrior/data are being utilized in a manner that prevents exposing them as volumes, due to the absence of certain necessary files.

For example, within /home/warrior/data, the binaries wget-at and wget-at-gnutls should be present:

https://github.com/ArchiveTeam/warrior-dockerfile/blob/86f9433661b11031c30a0031ea561140728c21c8/Dockerfile#L16-L17

I've discovered a potential workaround for this issue by altering Docker's data-root setting. However, this approach is not ideal and could lead to other complications.

Properly managing working directories and enabling the exposure of volumes would also facilitate the use of tmpfs mounts. This is particularly beneficial for users with ample RAM available who wish to conserve some SSD IOPS.

raybooysen commented 9 months ago

This was my primary usecase. My warriors run for long periods on machines with spare RAM, a tmpfs is a good use case here so avoid the SSD completely for temporal data.

budde96 commented 1 month ago

I'm using Podman, you can mount /home/warrior/projects and /home/warrior/data/projects fine and that seems to cover the files you want for persistence and all the big temp files from what I can tell with my tests.