CDX a b a m s k r M V g u
http://tillystranstuesdays.com/ 20240113012144 http://tillystranstuesdays.com/ text/html 200 AFQB6VVCWSKWEIAEJADJZAFMXOEGHO57 - - 1358 tillystranstuesdays.warc.gz <urn:uuid:4489ae0e-2e7d-482d-bff6-e86b02a3d719>
Run wget-at --warc-dedup=crashpoc.cdx --warc-file=test https://example.com
Expected behavior: wget-at to download https://example.com into test.warc.gz
#0 __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:76
#1 0x00005555555c0c1a in xstrdup (string=0x0) at xmalloc.c:338
#2 0x00005555555a270b in store_warc_record (uri=0x5555556126b0 "http://tillystranstuesdays.com/", date=0x0, uuid=0x555555612710 "<urn:uuid:4489ae0e-2e7d-482d-bff6-e86b02a3d719>",
digest=0x7fffffffe4b0 "\001`\037V\242\264\225b \004H\006\234\200\254\273\210c\273\277\377\177") at warc.c:1415
#3 0x00005555555a2a7c in warc_process_cdx_line (lineptr=0x5555556125b0 "http://tillystranstuesdays.com/", field_num_original_url=0x2, field_num_checksum=0x5, field_num_record_id=0xa) at warc.c:1520
#4 0x00005555555a2c9e in warc_load_cdx_dedup_file () at warc.c:1591
#5 0x00005555555a2e70 in warc_init () at warc.c:1658
#6 0x0000555555591d4d in main (argc=0x4, argv=0x7fffffffe488) at main.c:2088
store_warc_record is called with a null pointer as its second parameter:
digest is uninitialised when it is written to, causing a segfault and/or potential memory corruption. (In my case, digest was 0x0, but recompiling with -ggdb -O0 made it become some random writable pointer)
Steps to reproduce:
wget-at --warc-dedup=crashpoc.cdx --warc-file=test https://example.com
Expected behavior: wget-at to download https://example.com into test.warc.gz
Actual behavior:
Additional information:
Backtrace from GDB:
store_warc_record
is called with a null pointer as its second parameter:https://github.com/ArchiveTeam/wget-lua/blob/c1fe6093eda544fc7a933f7646225bec1ff4bd8d/src/warc.c#L1520
store_warc_record
doesn't check against null pointers, hence a segfault:https://github.com/ArchiveTeam/wget-lua/blob/c1fe6093eda544fc7a933f7646225bec1ff4bd8d/src/warc.c#L1405-L1422
When I was initially diagnosing this issue, I got a segfault from another area:
https://github.com/ArchiveTeam/wget-lua/blob/c1fe6093eda544fc7a933f7646225bec1ff4bd8d/src/warc.c#L1511-L1519
digest
is uninitialised when it is written to, causing a segfault and/or potential memory corruption. (In my case,digest
was0x0
, but recompiling with-ggdb -O0
made it become some random writable pointer)