ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.34k stars 134 forks source link

Fix ludios_wpull to support SQLAlchemy 1.4 #198

Closed ivan closed 8 months ago

ivan commented 3 years ago

Please help: I want a working grab-site in nixpkgs master, but nixpkgs has SQLAlchemy 1.4 and wpull does not support it (https://github.com/ArchiveTeam/wpull/issues/463). I would like the same issue fixed in ludios_wpull, but I haven't looked into it.

TheTechRobo commented 2 years ago

On a sort-of offtopic note, what is the status on merging ludios_wpull into wpull?

ivan commented 2 years ago

I would certainly be happy if someone took on that project, as I have no time to maintain grab-site at the moment.

I forked to ludios_wpull when it looked like wpull would not be maintained, and I changed and removed stuff without coordination with JAA or ArchiveTeam. I haven't looked at it in while and don't know what the results of a merger should look like.

matthewcen commented 2 years ago

I added basic SQLalchemy 1.4 support (amongst other upgrades) in my fork here: https://github.com/matthewcen/ludios_wpull/commits/master

The specific commit that added the comparability is here: https://github.com/matthewcen/ludios_wpull/commit/5c3b167c00ee37cfabe4866d32da7fa9dd6f2946

Please note i have only done very basic testing (e.g. changed enough code to get SQLAlchemy 1.4 to stop throwing errors for the various crawls I was performing)

ivan commented 1 year ago

@matthewcen Thanks for your work there. Do you think https://github.com/matthewcen/ludios_wpull ready to merge into ludios_wpull? I am about to begin testing it because Python 3.8 is running out of time.

ivan commented 1 year ago

https://github.com/matthewcen/ludios_wpull/commit/10854d200dc4d4171b802664f976ba8f7292d2b5 removed the wpull version from the WARCs, but we need the version to be able to identify possibly malformed WARCs in the future if they are caused by specific wpull versions.

matthewcen commented 1 year ago

@matthewcen Thanks for your work there. Do you think https://github.com/matthewcen/ludios_wpull ready to merge into ludios_wpull? I am about to begin testing it because Python 3.8 is running out of time.

Hi, @ivan , I will probably need to do some cleanup/touchup but it should be ready for testing.

matthewcen/ludios_wpull@10854d2 removed the wpull version from the WARCs, but we need the version to be able to identify possibly malformed WARCs in the future if they are caused by specific wpull versions.

I will need to review to determine why it was removed. But it can be readded easily.

HeliosLHC commented 8 months ago

Fixed upstream in ludios_wpull version 5 which supports SQLAlchemy 2.0+