Closed BubuAnabelas closed 5 years ago
@BubuAnabelas thanks for opening the issue. I did miss your comment in #18 I believe I have fixed this and the changes are on the dev branch.
Also per your example code in #18 you do not need to do
capture = new PuppeteerCapturer(page)
....
page.on('request', request => {
capture.requestWillBeSent(request)
request.continue()
})
This will cause double entries per request, the capturer does all that for you so below will suffice :)
capture = new PuppeteerCapturer(page)
If that does not fix your issue please let me know
That one seems fixed, now I've noticed it adds the WARC-Warcinfo-ID
field but there isn't a warcinfo record. In addition, the request header adds a newline just above the WARC-Warcinfo-ID
which breaks the header:
Also, for some reason, the request
header misses a newline (beneath Content-Length
) which leaves the record and the HTTP headers all together.
Both the WARC-Warcinfo-ID
nor the missing newline in the request where happening in the previous commit.
@BubuAnabelas yup should be fixed with https://github.com/N0taN3rd/node-warc/commit/52ca586f554a2cf5254ce7b3f4b241c4c389818a
Seems better now, but the WARC-Concurrent-To
field is missing.
I'll do a PR in the following days to help you with this if you don't mind, to help you with this issue.
Besides this, the library works like a charm, very good job!
Everything works fine for now, closing this issue.
I don't know if you've seen my latest comment in the #18 issue so I open this new one just in case.
When fixing #18 it broke the
WARC-Concurrent-To
field, make in it<urn:uuid:null>
. I used the same code to test this as in #18. This issue may be related to #4.According to ISO 28500:
So it should be the same if the field is either in the 'request' or the 'response' record.