Closed BubuAnabelas closed 5 years ago
Yes indeed it does. Thanks for opening the issue! Looks like we're using the warcinfos id this._rid
and not uuid()
. Will be fixed shortly
@BubuAnabelas it is fixed
You can verify it by running the command below (rg is ripgrep)
rg "WARC-Record-ID:" node-warc-generated-warc.warc | cut -c17- | uniq -c | sort -nr | less
Now the WARC-Concurrent-To
field is always <urn:uuid:null>
which should be the response's WARC-Record-ID
I was testing the new features of the library, specialy the Puppeteer's Request Capturer and the WARC Generator along with headless-chrome-crawler with the following script:
It creates the WARC file without any errors but when you look into it all the
WARC-Record-ID
. Because of this, all theWARC-Concurrent-To
fields are the same too.One way to fix it is to create the generator, init it, write the request and close it for each request like this:
But that is 100% inefficient.