Hi,
I'm trying to read the ClueWeb09 warc file but there is not data emitted nor error.
It seems that ClueWeb's separator is different from standard warc files, I have forked this repository for changing it but headers are sometimes not parsed correctly.
I was wondering if you could help me out, thank you.
Hi, I'm trying to read the ClueWeb09 warc file but there is not data emitted nor error. It seems that ClueWeb's separator is different from standard warc files, I have forked this repository for changing it but headers are sometimes not parsed correctly.
I was wondering if you could help me out, thank you.