ArchiveTeam / warrior-dockerfile

A Dockerfile for the ArchiveTeam Warrior
306 stars 57 forks source link

ARM support #56

Open TomGlass opened 3 years ago

TomGlass commented 3 years ago

Currently the ArchiveTeam Warrior does not support ARM devices.

We're investigating this and looking into building a new image, obviously this requires a lot of work as wget-at (formerly wget-lua) is a complicated setup. We need to validate it and ensure it produces valid content across multiple devices including but not limited to the various Raspberry Pis, Apple M1 architecture and other ARM SBCs such as Orange Pis.

For now we ask anyone working on it please do NOT test anything on live production projects, the risk of garbage data is not one we wish to take. We will begin testing hopefully next week.

For now the current devices we have access to test with are sufficient but may require more later on

billsargent commented 3 years ago

I recommend removing the raspberry pi dockerfiles as well as modifying the README until Arm support is ready.

TomGlass commented 3 years ago

I recommend removing the raspberry pi dockerfiles as well as modifying the README until Arm support is ready.

As it does still work for URLTeam to my knowledge I'll leave it but update readme

Mechazawa commented 3 years ago

What hurdles are there to get it running on ARM64?

jhollowe commented 3 years ago

The main thing is compiling dependencies for arm64, mainly wget-lua and any python packages that do not have wheels built for arm64 (zstandard). Once all software is compiled for arm64, the main functionality of the warrior should be the same as any other platform.

billsargent commented 3 years ago

Can it not be forced to compile things needed to arm32 then until a fix is in place? Can we even pass gcc flags to it?

TomGlass commented 3 years ago

The main thing is compiling dependencies for arm64, mainly wget-lua and any python packages that do not have wheels built for arm64 (zstandard). Once all software is compiled for arm64, the main functionality of the warrior should be the same as any other platform.

It's sadly not that simple. We need to validate wget-at is stable and provides comparable data to amd64. We already are running 2 builds of wget-at due to an issue regarding the reddit project. I have just today finally got an RPI-4 up and I will begin testing next week.

Can it not be forced to compile things needed to arm32 then until a fix is in place? Can we even pass gcc flags to it?

Sadly not.

What hurdles are there to get it running on ARM64?

Highlighted just above.

Veehxia commented 3 years ago

Any news on this ?

HeroCC commented 3 years ago

Just wanted to give my 2 cents here -- it would be incredibly helpful to have a merged manifest with all of the architectures under one tag. This is possible with github actions / docker buildx, but I'm not sure how to do it with Drone's docker plugin. I run a kubernetes cluster with a mix of arm7, arm64, and amd64 boxes, and it would be much better to have one tag, where each box could pull from.

If there's anything you'd like the community to help with please let us know, I'd be happy to help get this up and running :)

dieser-niko commented 2 years ago

Any updates yet? Would be really nice if this would work before YouTube completely removes dislikes

billsargent commented 2 years ago

Is this being worked on? Currently the Dockerfile is completely outdated for raspberry pi OS. the "python" is no longer available for 2.7. Its needs to be python2 and python-wheel is for Python3 now. python-pip is no longer available for 2.7 as well. That would need to be manually installed during the image creation.

I've been trying to work past this but now when I run the docker container, I get this

Traceback (most recent call last): File "/usr/local/bin/run-warrior3", line 4, in <module> from seesaw.script.run_warrior import main File "/usr/local/lib/python3.10/dist-packages/seesaw/script/run_warrior.py", line 13, in <module> from seesaw.warrior import Warrior File "/usr/local/lib/python3.10/dist-packages/seesaw/warrior.py", line 20, in <module> from tornado.httpclient import AsyncHTTPClient File "/usr/local/lib/python3.10/dist-packages/tornado/httpclient.py", line 49, in <module> from tornado import httputil, stack_context File "/usr/local/lib/python3.10/dist-packages/tornado/httputil.py", line 106, in <module> class HTTPHeaders(collections.MutableMapping): AttributeError: module 'collections' has no attribute 'MutableMapping'

And I have no idea how to fix that.

billsargent commented 2 years ago

I have created a Dockerfile for this that works. The only proejct that doesnt work is the ua one.

https://pastebin.com/1gpvEZ7E

I upgraded debian to bullseye. It needed a lot more packages. It needed a specific branch of wget-at

I commented out all cleanup. Someone with more experience can decide what works and doesnt. BUt this will get you going on Raspberry pi.

dieser-niko commented 2 years ago

I have created a Dockerfile for this that works. The only proejct that doesnt work is the ua one.

https://pastebin.com/1gpvEZ7E

I upgraded debian to bullseye. It needed a lot more packages. It needed a specific branch of wget-at

I commented out all cleanup. Someone with more experience can decide what works and doesnt. BUt this will get you going on Raspberry pi.

Doesn't seem to work. Could you provide instructions?

billsargent commented 2 years ago

I have created a Dockerfile for this that works. The only proejct that doesnt work is the ua one. https://pastebin.com/1gpvEZ7E I upgraded debian to bullseye. It needed a lot more packages. It needed a specific branch of wget-at I commented out all cleanup. Someone with more experience can decide what works and doesnt. BUt this will get you going on Raspberry pi.

Doesn't seem to work. Could you provide instructions?

Can you tell me what doesn't work? I've built this on a Cubietruck board, and 3 RPi. Even a raspberry pi model 1. If you use a current raspiian OS lite, it should work straight out of the box following the same instructions from this repo. You can also use mine. https://github.com/billsargent/at-warrior-rpi.git

dieser-niko commented 2 years ago

Can you tell me what doesn't work?

After building the container I tried to run it with the provided command in the readme with sudo. This error was returned:

Traceback (most recent call last):
  File "/usr/local/bin/run-warrior3", line 13, in <module>
    main()
  File "/usr/local/lib/python3.9/dist-packages/seesaw/script/run_warrior.py", line 75, in main
    setup_logging(args.data_dir)
  File "/usr/local/lib/python3.9/dist-packages/seesaw/script/run_warrior.py", line 24, in setup_logging
    handler = logging.handlers.TimedRotatingFileHandler(
  File "/usr/lib/python3.9/logging/handlers.py", line 208, in __init__
    BaseRotatingHandler.__init__(self, filename, 'a', encoding=encoding,
  File "/usr/lib/python3.9/logging/handlers.py", line 58, in __init__
    logging.FileHandler.__init__(self, filename, mode=mode,
  File "/usr/lib/python3.9/logging/__init__.py", line 1142, in __init__
    StreamHandler.__init__(self, self._open())
  File "/usr/lib/python3.9/logging/__init__.py", line 1171, in _open
    return open(self.baseFilename, self.mode, encoding=self.encoding,
PermissionError: [Errno 13] Permission denied: '/data/data/warrior.log'

Also, should we move this issue to your repo?

billsargent commented 2 years ago

The last line of your log shows the problem. The data location is not writable. Change its permissions to allow it to be written.

You can if you want, but I am not actively working on this project. I simply put it up for my own needs. I was simply fixing some of the dependencies that were broken. It has built and ran on even non-raspberry pi devices for me. I'm currently running it on a cubietruck SOC with armbian.

As a side note, only a couple of projects work. Telegram and reddit both work.

JustAnotherArchivist commented 1 year ago

I didn't see these comments at the time, but @billsargent @dieser-niko it's fine if you want to experiment with ARM support, but don't run it against the official tracker. wget-at has not been verified to work correctly on ARM, and so you might be polluting our work with bad data. So please set up your own tracker instance and projects for testing purposes.

billsargent commented 1 year ago

I didn't see these comments at the time, but @billsargent @dieser-niko it's fine if you want to experiment with ARM support, but don't run it against the official tracker. wget-at has not been verified to work correctly on ARM, and so you might be polluting our work with bad data. So please set up your own tracker instance and projects for testing purposes.

I stopped working on this a long time ago. I published my work as to how to get it working but then abandoned it. Its too much work and it seems nobody is really all that interested in gettig this working on arm as they should be.

JustAnotherArchivist commented 1 year ago

More accurately: very few people know how to properly verify wget-at, and they (myself included) lack the time to work on maintenance tasks such as this (as opposed to working on urgent archival projects or fixing stuff that breaks). As you mention, it's not a small task to get this right.

billsargent commented 1 year ago

It would be great to get it going though because many people have tons of low power arm devices that could contribute. But it's a big hassle right now. Maybe some day though :)

I have a fork of this and I'm gonna pull the plug on it. I dont want others corrupting things with what I did as an experiment.

JustAnotherArchivist commented 1 year ago

Oh yeah, it would absolutely be great to have it! There are also some interesting providers with very powerful ARM-based servers, e.g. Hetzner's RX line.

berezovskyi commented 1 year ago

Just a side note: I can contribute a warrior instance by spinning up a container on an x86 cloud VM, but if AT wants a warrior on a residential IP address, I can only spare a Raspberry Pi for that. Hope this adds a bit of motivation to this issue apart from requesting more architectures for fun.

mikeakers commented 1 year ago

Adding another vote for this issue, I have an ARM64 NAS server in my home that I would love to be able to use to contribute to this project.

In the meantime what are the dev's thoughts on using qemu to emulate amd64 as described here?

penyuan commented 1 year ago

I'm not much of a developer (sorry!), but am happy to help test this should ARM support be eventually developed.

ArchiveTeam is such a great cause, IMHO it'll be a huge win to bring onboard the enormous compute power all the ARM devices that are out there! Therefore ARM support is probably worth the strategic effort!

billsargent commented 1 year ago

I'm not much of a developer (sorry!), but am happy to help test this should ARM support be eventually developed.

ArchiveTeam is such a great cause, IMHO it'll be a huge win to bring onboard the enormous compute power all the ARM devices that are out there! Therefore ARM support is probably worth the strategic effort!

I've let it go. I had a functioning, somewhat, container for arm but if its sending out bad data and all that, I'd rather wait on the people who actually do know whats going on. wget-at seems to be the big issue. I got tired of trying to get this working because it seems nobody cares enough. Its a huge missed opportunity seeing as how arm is low powered and if you're like me, you may have a dozen SoCs sitting around idle and plenty of bandwidth. but I'm not leaving on a desktop or laptop machine 24/7.

JustAnotherArchivist commented 1 year ago

Please don't insinuate that we don't care. We do. As I explained above, we just don't have the time to properly audit this currently. Remember we all do this voluntarily in our free time, and stuff's being put on fire constantly (cf. Imgur, Reddit), which usually keeps us more than busy already. That said, there is a rough internal plan for how this might be able to move forward. It depends on several other steps that need to happen first though. I wouldn't recommend holding your breath, but we do want to fix this eventually. And, like you, we'd like to get it done ASAP, but it unfortunately isn't at the top of our urgency list due to the above.

billsargent commented 1 year ago

Please don't insinuate that we don't care. We do. As I explained above, we just don't have the time to properly audit this currently. Remember we all do this voluntarily in our free time, and stuff's being put on fire constantly (cf. Imgur, Reddit), which usually keeps us more than busy already. That said, there is a rough internal plan for how this might be able to move forward. It depends on several other steps that need to happen first though. I wouldn't recommend holding your breath, but we do want to fix this eventually. And, like you, we'd like to get it done ASAP, but it unfortunately isn't at the top of our urgency list due to the above.

I worded that wrong. It wasn't a dig at you or the devs. I meant people in general that actually know more than I do about it. If people want this to work, more people with experience AND TIME need to join. But that doesn't appear to be happening. And that is what I meant by not caring. Because to be honest, there should be a huge number of people with extensive experience offering their help. And there isn't. So I apologize for making it sound like this was a dig at the people currently working. I know you're busy and I know this is volunteer work but I actually would expect wit ha feature like this that there'd be a lot more volunteers willing to dig into it and supply patches and fixes.

JustAnotherArchivist commented 1 year ago

I guess this was never properly explained, so let me give you some perspective on that.

I would expect the core of wget to be fine on ARM, though I haven't checked whether they actually test it; there's a reasonable test suite though, so that's easy enough to run and double-check. The potentially problematic parts are WARC writing (poorly tested upstream, if at all) and our customisations. There are few people on this planet who know the WARC specification well enough to validate it, and even fewer familiar with the specification for zstd-compressed WARCs. You can probably count the latter on one hand. To my knowledge, we (AT) are still the only ones using that. So there are two or three people from AT, someone at IA, and two people who worked on the spec who could theoretically verify the WARC stuff; practically, it has to be one of the 2-3 from AT. For the customisations, it's only those 2-3 anyway, or maybe even less (I'm familiar with WARCs but not overly with the custom stuff in wget-at, for example). Add to this the various undocumented intentional quirks of our code.

So yeah, there are literally only a couple people in the world with the necessary knowledge and experience to do this. That's why it's bottlenecked so hard.

billsargent commented 1 year ago

ive got the hardware

Are you incapable of writing your thoughts in one reply?

ramazansancar commented 10 months ago

I have the same problem on Raspberry pi 4. Is there a solution for this?

JustAnotherArchivist commented 10 months ago

No, that's why the issue isn't closed.

sufehmi commented 3 months ago

Subscribed with hope to be notified when I can run this on my raspi machines, thanks.