ArchiveBox / archivebox-browser-extension

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.
https://chromewebstore.google.com/detail/archivebox-exporter/habonpimjphpdnmcfkaockjnffodikoj
MIT License
241 stars 21 forks source link

Support: How to point browser extension to correct ArchiveBox server endpoint #20

Closed p6002 closed 10 months ago

p6002 commented 1 year ago

I tried adding to the Edge browser extension:

serverip:port

serverip:port/admin

serverip:port/admin/core

archive.mydomain.com

archive.mydomain.com:port

archive.mydomain.com/admin

archive.mydomain.com/admin/core

With and without http/https - no success. I can't manually or automatically add pages to download.

I activated the cli option to use without login as it says in the documentation.

When I add the page manually through the site, it downloads correctly. The problem is only with the browser addon.

pirate commented 1 year ago

Can you share the output of archivebox --version and your ./data/logs/*.log log files around the time when you're clicking archive from the extension?

p6002 commented 1 year ago

Version: https://pastebin.com/MtNx0p56 Error.log (only file in log directory) https://pastebin.com/y17RWHp6

pirate commented 1 year ago

Can you try with the dev version instead of master? set it to archivebox/archivebox:dev and then do docker-compose pull.

p6002 commented 1 year ago

No changes: https://pastebin.com/9EvFhspY

pirate commented 1 year ago

can you try setting archivebox config --set SAVE_MERCURY=False and try again, seems there might be an issue with the mercury article text extractor.

p6002 commented 1 year ago

Still nothing, here is output:

`root@f7704c11dd33:/data# su archivebox $ archivebox config --set SAVE_MERCURY=False find: '/.config/chromium/Crash Reports/pending/': No such file or directory [i] [2023-02-21 19:23:08] ArchiveBox v0.6.3: archivebox config --set SAVE_MERCURY=False

/data

find: '/.config/chromium/Crash Reports/pending/': No such file or directory find: '/.config/chromium/Crash Reports/pending/': No such file or directory find: '/.config/chromium/Crash Reports/pending/': No such file or directory SAVE_MERCURY=False

[i] Note: This change also affected these other options that depended on it: USE_MERCURY=False $ `

pirate commented 1 year ago

Can you screenshot the extension config options from the extension popup in your browser? Specifically want to see how you configured the server URL in the end (I know you tried all the options specified in your original post). I don't have any good solution/ideas yet so grasping at straws a bit, but maybe theres something weird I can see in a screenshot.

For reference it should be http://archive.mydomain.com, http://archive.mydomain.com:port, or https://archive.mydomain.com like so:

image

Thanks

p6002 commented 1 year ago

Here is screenshot: https://i.ibb.co/2NLWJpD/2023-02-25-002708.png

Of course, instead of domain is my domain. I can still manually add the page in the app. This is what the compose file looks like from where I run the container: https://pastebin.com/QdCqDwFw

pirate commented 1 year ago

Can you try with 8000 just to test temporarily instead of 8505, maybe it's a port mapping issue? I've seen issues with non-default ports in the past.

Or set all the ports to 8505 like so:

command: server --quick-init 0.0.0.0:8505
        ports:
            - 8505:8505
p6002 commented 1 year ago

I changed to 1209 and still the same thing. I have the firewall on the Synology turned off. Overall I have about 40 containers on different configurations and everything works. Archivebox also works in the browser, only the plugin does not connect.

I'm using the Edge browser on Windows, but I also tested in Firefox on Ubuntu and there is the same problem.

When I just add https://mydomain.com/ the task number on the addon icon just blinks.

If I add https://mydomain.com:1209/ then the number 1 on the icon lights up for a few seconds, but it doesn't change anything.

Yesterday I was still using Nginx proxy manager, but I changed to Synology's built-in reverse proxy and it didn't help at all.

Maybe some other browser plugin or setting is blocking this connection?

-

Now I have observed what happens in the log when trying to use addon.

When I use Firefox in a browser where I am not logged into archivebox, after clicking "Archive current page" it shows:

"POST /add/ HTTP/1.1" 302 0
"GET /accounts/login/?next=/add/ HTTP/1.1" 302 0
"GET /admin/login/ HTTP/1.1" 200 11143

When in Egde where I am logged in it shows: "POST /add/ HTTP/1.1" 200 7049 Nevertheless, nothing is added to the page.

pirate commented 1 year ago

Thats a great sign, it's getting the /add/ submission at least. Are you sure you're adding unique URL's that arent already archived? ArchiveBox only archives URLs once, it doesn't re-snapshot if you already have the URL.

p6002 commented 1 year ago

Do you have some working docker-compose file for this project?

pirate commented 1 year ago

The default one in the repo works.

p6002 commented 1 year ago

Works good, but not work with browser addon.

p6002 commented 1 year ago

I tested on firefox and the extension works fine. The problem exists only on Edge.

What do you need to help you fix it?

unmenschlich commented 1 year ago

here it doesnt work.

Server: try 1. docker bridge + port 8000 -> 8040 try 2. docker host + port 8000 archivebox/archivebox:dev errorlog: https://pastebin.com/raw/XutpN9Kf

Firefox: add base url and "add current domain to list" but nothing happens can connect both tries normally with firefox

p6002 commented 1 year ago

This project hasn't been moved for 2 years, which is probably why it is no longer supported by browsers.

unmenschlich commented 1 year ago

oh, thank you for info

pirate commented 1 year ago

@p6002 The extension hasn't had a major release because I have an ArchiveBox refactor that's been slow-moving and touches a lot of pieces and will add a new REST API. The extension developer is likely waiting for that new API to land before continuing work on this extension.

In the meantime ArchiveBox development and bugfixes have been ongoing in the dev branch. AFAIK the dev branch works with this extension in browsers for some people, I am using it right now without issues, but because I don't know much about the extension, I'm not exactly sure what might be breaking for the other people reporting issues in this thread.

gerroon commented 10 months ago

This does not seem to work for me either on Brave. Can anyone please confirm this is supposed to be working or not with the latest dev branch of archive box? Do I have to be setting up something special in docker-compose.yml to make it work?

pirate commented 10 months ago

I can confirm it's been working for a while and is currently working for me.

Can you post your docker-compose.yml and the full output of docker compose run archivebox version.

gerroon commented 10 months ago

@pirate Thanks for the reply.

docker compose run archivebox version

chown: cannot access '/browsers/*': No such file or directory
0.7.1+editable

ArchiveBox v0.7.1+editable Cpython Linux Linux-5.14.0-4-amd64-x86_64-with-glibc2.36 x86_64
DEBUG=False IN_DOCKER=True IN_QEMU=False IS_TTY=True TZ=UTC FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644 SEARCH_BACKEND=ripgrep

When I add a domain then add the page via the brave addon this is what I get in the running docker terminal

archivebox_1  | "POST /add/ HTTP/1.1" 302 0
archivebox_1  | "GET /accounts/login/?next=/add/ HTTP/1.1" 302 0
archivebox_1  | "GET /admin/login/ HTTP/1.1" 200 11143
archivebox_1  | "POST /add/ HTTP/1.1" 302 0
archivebox_1  | "GET /accounts/login/?next=/add/ HTTP/1.1" 302 0
archivebox_1  | "GET /admin/login/ HTTP/1.1" 200 11143
pirate commented 10 months ago

Did you set docker compose run archivebox config --set PUBLIC_ADD_VIEW=True?

It's required to allow the extension to submit URLs without authenticating.

gerroon commented 10 months ago

I did not do that, the extension page did not mention it. I just ran that command and restarted the container. I still have the same issue.

I can see it posts something but nothing happens on the server. I can add urls manually in the server's own page but that is so much friction to have it open and accessible all the time.

archivebox_1  | "POST /add/ HTTP/1.1" 302 0
archivebox_1  | "GET /accounts/login/?next=/add/ HTTP/1.1" 302 0
archivebox_1  | "GET /admin/login/ HTTP/1.1" 200 11143

Is the extension using a REST API or doing some special server talk? I will try to look into it, although I am not a web dev. I can maybe make it work.

pirate commented 10 months ago

Can you post the full verbatim output of:

docker compose pull
docker compose run archivebox version
docker compose run archivebox config --set PUBLIC_ADD_VIEW=True
docker compose run archivebox config
docker compose down
docker compose down  # yes, really, run it twice
docker compose up
docker compose logs

(please don't redact anything, just copy paste the exact commands you typed in and the full output as they appear in your terminal)

Edit: fixed typo PUBLIC_ADD_PAGE -> PUBLIC_ADD_VIEW

mamema commented 10 months ago

isn't the parameter PUBLIC_ADD_VIEW=True ?

BTW: i might have found something. As long as there are no entries in the domain area OR (that i'm not sure until now), no regex entries OR wrong regex entries, then the extension is reporting and archivebox is saving entries

pirate commented 10 months ago

Ah right sorry, I misremembered. (edited to fix it above)

Here it is in the docs about the extension for future reference: https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#browser-extension-usage.

If anyone still needs help, please open a separate issue! 😁