Closed qurikuduo closed 1 year ago
Maybe a simple solution is to set up your own WebConnectionWrapper and intercept the request url's. For the large ones don's call super and simply return a static response.
see https://www.htmlunit.org/faq.html#HowToModifyRequestOrResponse as starting point
will try to make a bit more detailed description ....
Sounds like an option.
public static DownloadedContent downloadContent()
:
When
while( readCount = InputStream.read(buffer) !=0){ //... }
Is it a solution?
thx.After trying a few small tricks, I achieved the functionality I wanted. Here is what I did:
Specify my own WebConnectionWrapper copied from HttpWebConnection and put it in package org.htmlunit : public class MyxxHttpWebConnection extends HttpWebConnection
,
Override public WebResponse getResponse(final WebRequest webRequest)
and get content-length by read : httpResponse.getFirstHeader(ContentLength).getValue()
,
determine if it is too large:
if(contentLengthLong> maxContentLength){
System.out.println("Content is too big. url="+webRequest.getUrl().toString()+" contentLength = " + contentLengthLong + ", maxContentLength = " + maxContentLength);
httpMethod.abort();
httpResponse.setEntity(null);
}
Specify my own AttachmentHandler: public class MyxxAttachmentHandler implements AttachmentHandler
@Override:
public void handleAttachment(final Page page) {
//not download attachment lager than 100KB
if(page.getWebResponse().getContentLength() > maxAttachmentSize){
System.out.println("Attachment is too big. url=" + page.getUrl()+" contentLength = " + page.getWebResponse().getContentLength() + ", maxAttachmentSize = " + maxAttachmentSize);
try {
page.getEnclosingWindow().getWebClient().getWebConnection().close();
}catch(Exception e){
logger.error("Error when close attachment download.", e);
}
finally {
try {
page.getWebResponse().cleanUp();//new AbstractPage(page.getWebResponse(),page.getEnclosingWindow())) ;
page.getEnclosingWindow().setEnclosedPage(new HtmlPage(createWebResponse(new WebRequest(page.getUrl(),page.getWebResponse().getWebRequest().getHttpMethod()), "",
page.getWebResponse().getContentType(), page.getWebResponse().getStatusCode(),page.getWebResponse().getStatusMessage()),page.getEnclosingWindow()));
} catch (Exception e) {
logger.error("Error when close attachment download.", e);
}
return;
}
}
else {
//if not response
collectedAttachments_.add(new Attachment(page));
}
}
Create new instance before calling getPage(url):
webClient.setAttachmentHandler(new MyxxAttachmentHandler(attachmentList) ); new WebConnectionWrapper(webClient) { public WebResponse getResponse(WebRequest request) throws IOException { MyxxHttpWebConnection webConnection = new MyxxHttpWebConnection(webClient); return webConnection.getResponse(request); } }; page=webClient.getPage(url) if(attachmentList.size()>0){ //download attachment. long contentLength = attachement.getPage().getWebResponse().getContentLength(); if(contentLength==0||(contentLength>MyxxAttachmentHandler.maxAttachmentSize)){ System.out.println("attachment too large, will not save to disk. contentLength = "+contentLength); continue; } else{ //save attachment to file. } }
It is work for me now.
Hi @qurikuduo,
slowly i got an idea what you like to do. I made some small changes and now i can do something like this.
@Test
public void contentBlocking() throws Exception {
final byte[] content = new byte[] {77, 44};
final List<NameValuePair> headers = new ArrayList<>();
headers.add(new NameValuePair("Content-Encoding", "gzip"));
headers.add(new NameValuePair(HttpHeader.CONTENT_LENGTH, String.valueOf(content.length)));
final MockWebConnection conn = getMockWebConnection();
conn.setResponse(URL_FIRST, content, 200, "OK", MimeType.APPLICATION_JSON, headers);
startWebServer(getMockWebConnection());
final WebClient client = getWebClient();
client.setWebConnection(new HttpWebConnection(client) {
@Override
protected WebResponse downloadResponse(final HttpUriRequest httpMethod,
final WebRequest webRequest, final HttpResponse httpResponse,
final long startTime) {
// check the header here if you like
// call return super.downloadResponse() in case you are happy with the headers
httpMethod.abort();
// create empty response and mark as blocked for later
final DownloadedContent downloaded = new DownloadedContent.InMemory(null);
final long endTime = System.currentTimeMillis();
final WebResponse response = makeWebResponse(httpResponse, webRequest, downloaded, endTime - startTime);
response.markAsBlocked("test blocking");
return response;
}
});
final UnexpectedPage page = client.getPage(URL_FIRST);
assertTrue(page.getWebResponse().wasBlocked());
assertEquals("test blocking", page.getWebResponse().getBlockReason());
}
Will this help to simplify your code? do you need some other changes for your case?
@qurikuduo just made a new snapshot build - please try
3.4.0-SNAPSHOT
Have update the documentation a bit - https://www.htmlunit.org/details.html Hope that helps.
Will close this, hope the changes and the docu are sufficient
Than you very much.
Hi there, Some URL have a response with header "Content-Type: application/octet-stream". Should I process it as an attachment? After some digs, The Attachment only handle specific response which define in rfc-2183. the :
attachmentHandler_.isAttachment(webResponse)
will return False when we have "application/octet-stream". I found org.htmlunit.HttpWebConnection.downloadContent() will be called:public static DownloadedContent downloadContent(final InputStream is, final int maxInMemory)
It will download the response content. If I DON'T want HtmlUnit to download big content( e.g. https://dg.10000gd.tech:12348/shmfile/100 ), what should I do? I want to block download action if a resource lager than 20MB to save on bandwidth.Thanks a lot.