WebParser bug with Download=1

Originally posted to the Rainmeter Forum 
jsmorley » June 15th, 2009, 9:46 am 

Original forum thread can be found here:
http://www.rainmeter.net/forum/viewtopic.php?f=20&t=390#p2241

Open post:
I have mentioned in the past that Webparser, with the "Download=1"
parameter, has trouble downloading images which are displayed in the site
source as a "relative" directory.

The Problem

Using the Rainmeter "logo" from the forums here as an example:

<img src="http://rainmeter.net/forum/styles/saphic/imageset/site_logo.png">

If you use RegExp to find
"http://rainmeter.net/forum/styles/saphic/imageset/site_logo.png" and set
Download=1 the image is downloaded fine and can be displayed in a meter.

If however it is:

<img src="./styles/saphic/imageset/site_logo.png">

Which is what it actually is here, and by far the more common result on a
decently designed website, it won't work at all. It just won't work if the
image is stored as a "relative" reference to a directory on the server
instead of a full URL.

I have suggested in the past that the entire concept of "relative"
directories for downloading images must be missing in WebParser. In looking
at the WebParser.cpp code, I find that in fact the capability IS there,
just broken...

Where the problem lies

If we look at the code in WebParser.cpp:

Starting at line 567
-----------------------------------------------------------------
    {
          EnterCriticalSection(&g_CriticalSection);
          url = urlData->resultString;
          LeaveCriticalSection(&g_CriticalSection);

          size_t pos = url.find(':');
          if (pos == -1 && !url.empty())   // No protocol
          {
             // Add the base url to the string
             if (url[0] == '/')
             {
                // Absolute path
                pos = urlData->url.find('/', 7);   // Assume "http://" (=7)
                if (pos != -1)
                {
                   std::wstring path(urlData->url.substr(0, pos));
                   url = path + url;
                }
             }
             else
             {
                // Relative path

                pos = urlData->url.rfind('/');
                if (pos != -1)
                {
                   std::wstring path(urlData->url.substr(0, pos + 1));
                   url = path + url;
                }
             }
          }
       }
-------------------------------------------------------------------------
We find that the code IS first trying to download using the result of the
RexExp alone, and then trying it with an approach of sticking the "URL" on
the front of the result of the RegExp.

What should happen is that it would first try
-------------------------------------------------------------------
    ./styles/saphic/imageset/site_logo.png
-------------------------------------------------------------------

which will fail, and then try 
--------------------------------------------------------------------
    http://rainmeter.net/forum/./styles/saphic/imageset/site_logo.png
---------------------------------------------------------------------

which will succeed.

However

What I find in Rainmeter.log is that there is an error in the code. On the
attempt to "build" the full URL to the image it isn't appending the result
of the RegExp to the site URL, but rather appending THE ENTIRE LINE that
the result was found on to the URL.

So in the log we see this:

DEBUG: (00:02:35.187) WebParser: Downloading url
./styles/saphic/imageset/site_logo.png to
C:\Users\JEFFRE~1\AppData\Local\Temp\Rainmeter-Cache\site_logo.png
DEBUG: (00:02:35.187) WebParser: Downloading url http://rainmeter.net/<a
href="./index.php" title="Board index" id="logo"><img
src="./styles/saphic/imageset/site_logo.png"/></a> to
C:\Users\JEFFRE~1\AppData\Local\Temp\Rainmeter-Cache\a>

DEBUG: (00:02:35.187) WebParser: Download failed:
./styles/saphic/imageset/site_logo.png
DEBUG: (00:02:35.250) WebParser: Download failed: http://rainmeter.net/<a
href="./index.php" title="Board index" id="logo"><img
src="./styles/saphic/imageset/site_logo.png"/></a>

Clearly it is trying to work, and would work great if the line(s) in the
code where it builds the full URL from a combination of the URL and the
result in the RegExp wasn't just slightly broken.

I would be forever grateful if one of the devs could look into this. It may
be a VERY simple fix and would make WebParser.dll just orders of magnitude
more useful.
Last edited by jsmorley on June 15th, 2009, 10:03 am, edited 3 times in total. 

End of Post
Original issue reported on code.google.com by evmckay on 22 Jun 2009 at 3:31
google-code-export / rainmeter

WebParser bug with Download=1 #68