GCuser99 / SeleniumVBA

A comprehensive Selenium wrapper for browser automation developed for MS Office VBA running in Windows
MIT License
83 stars 17 forks source link

Issue for test code of "Sub test_PageToHTMLMethods()" #76

Closed GHRyunosuke closed 1 year ago

GHRyunosuke commented 1 year ago

New to SeleniumVBA, found it an awsome Job! Now trying out the "Sub test_PageToHTMLMethods()" test code and having the follwoing error (saying getWindowThreadProcessId not found in DLL user32):

image

sees to be following declaration does not have "Alias"?

Private Declare PtrSafe Function getWindowThreadProcessId Lib "user32" (ByVal hWnd As LongPtr, lpdwProcessId As Long) As Long

then i modifed it into:

Private Declare PtrSafe Function getWindowThreadProcessId Lib "user32" Alias "GetWindowThreadProcessId" (ByVal hWnd As LongPtr, lpdwProcessId As Long) As Long

Then the error becomes the following and still get stuck, any ideas?

image

6DiegoDiego9 commented 1 year ago

Hi @GHRyunosuke and welcome!

According to MS, Alias is optional. Can you please try by removing the Alias and using the capital letter directly in "PtrSafe Function GetWindowThreadProcessId"?

On the second problem, it seems that the caption of your VBE window doesn't match this regex that I wrote to get the name of your acttive VBA Project file name: "^Microsoft Visual Basic[^-] - ([^[]) [[^]]+] - [[^(]+([^)]+)]$"

Could you please upload a screenshot of your VBA editor upper left region, like the following, so that I can adjust the regex to include your different format?

image
GHRyunosuke commented 1 year ago

Thanks for the reply.

1st issue: Can you please try by removing the Alias and using the capital letter directly in "PtrSafe Function GetWindowThreadProcessId"? --> works perfectly.

2nd issue: Here it goes the screenshot

image

My office version is office 365 (Japanese version). It seems the Japanese have caused the unmatch? Looking forward to the next update with improvements for the above.

6DiegoDiego9 commented 1 year ago

Yes it's that " [....]" you have after the file name. I didn't expect that.

Replace the line: oregex.Pattern = "^Microsoft Visual Basic[^-]* - ([^[]*) \[[^]]+\] - \[[^(]+\([^)]+\)\]$"

with: oregex.Pattern = "^Microsoft Visual Basic[^-]* - (.*?)(?:| \[[^]]*\]) - \[[^]]*\]$"

This new regular expression should work for your version too.

Please let me know. If it works, I'll make the two fixes on the fly ;)

GHRyunosuke commented 1 year ago

Replace the line: oregex.Pattern = "^Microsoft Visual Basic[^-] - ([^[]) [[^]]+] - [[^(]+([^)]+)]$"

This caused the error "Error: unable to extract file name from VBE Windows caption. Check the extraction regex." --> BTW, in the Japanese windows system (mine is windows 11), the "\" mark would display as the "money mark" as you can see in the screenshot below.

Then I tried to debug a little, hope it helps: I changed the regex line into oregex.Pattern = "^Microsoft Visual Basic*" to make it for sure to match the VBE window

However, then it got stuck in the following line saying "Runtime error 5: Calling of procedure, or parameter is incorrect" image

Here is the screenshot of "regexRes" right before the line is executed, which will trigger the error above.

image

GCuser99 commented 1 year ago

@GHRyunosuke, while you are waiting for this to get corrected, you can continue your testing if you wish by inserting the following line to avoid the section of code where the error occurs:

    Set driver = SeleniumVBA.New_WebDriver

    driver.DefaultIOFolder = ThisWorkbook.Path  '<-- this line sets the base path from which all relative paths are calculated

Thanks again for reporting this!

GHRyunosuke commented 1 year ago

Hello, @GCuser99. Problem solved with "driver.DefaultIOFolder = ThisWorkbook.Path"! Thanks a lot for the advice. Looking forward that SeleniumVBA being the standard for MS office VBA web scraping.

6DiegoDiego9 commented 1 year ago

Hi @GHRyunosuke

That money mark is my main suspect.

The regex is not limited to match the whole VBE caption, like your regex does. It must also extract the file name contained in it, and this is done by the "()" characters in the regex.

You should be able to reproduce this with VBA:

image

Here is the code for you to copy and paste:

Sub test()
    Const testString$ = "Microsoft Visual Basic for Applications - TestVBA.xlsm [abcd] - [Module1 (abc)]"

    Dim regex
    Set regex = CreateObject("VBScript.RegExp")
    regex.Pattern = "^Microsoft Visual Basic[^-]* - (.*?)(?:| \[[^]]*\]) - \[[^]]*\]$"

    If regex.test(testString) Then
        Set match = regex.Execute(testString)(0)
        'Extract the capturing group
        result = match.SubMatches(0)
        MsgBox "Captured: """ & result & """"
    Else
        MsgBox "No match found."
    End If
End Sub

You can also test the same regex on https://regex101.com/:

image

Please let us know how you will be able to achieve these same results on your PC.

Although @GCuser99 gave you a temporary solution that works for your current script, it's important to fix the original solution so that it will work with all your future scripts too.

GHRyunosuke commented 1 year ago

Hi @6DiegoDiego9,

Thanks for the reply. Here is the result. Seems the Japanese Windows 11 OS still automatically turns the "\" into the money mark. However, the test result shows that the money mark functions as good as the "\", right?

image

Just in case, let me attach again the the screen shot of this issue:

image

GCuser99 commented 1 year ago

So it does not seem to be a problem with regex escaping - interesting!

@GHRyunosuke, can you please copy-paste the value cell of the "caption" variable from the locals window like you show above in your post? Same way that I have done below from my locals window:

"Microsoft Visual Basic for Applications - SeleniumVBA.xlam [running] - [test_PageToMethods (Code)]"

That way @6DiegoDiego9 has a way to test his regex pattern against your actual caption string. Thanks for your help and patience!

GCuser99 commented 1 year ago

BTW @6DiegoDiego9, in @GHRyunosuke's last post, the locals window shows caption length is 120 characters. But I count 115 excluding the terminal null char.

If I'm right about that, then:

Set regexRes = oRegex.execute(Left$(caption, result))

will not strip off the trailing null character of the caption, and your regex pattern will fail I think because of the terminating $ instruction:

'this will fail if caption string includes a terminal null char
oRegex.Pattern = "^Microsoft Visual Basic[^-]* - (.*?)(?:| \[[^]]*\]) - \[[^]]*\]$" 

'but this may work without any other changes just to see if I'm right (note absence of terminal $):
oRegex.Pattern = "^Microsoft Visual Basic[^-]* - (.*?)(?:| \[[^]]*\]) - \[[^]]*\]"
GHRyunosuke commented 1 year ago

Hello @GCuser99,

Here it goes the "caption" variable from the locals window

1. copy&paste here in github comment: "Microsoft Visual Basic for Applications - SVBA3.6_AllInOne_Template1.3_Now.xlsm [実行中] - [test_PageToMethods (コード)]"

Just in case here is the screenshot of how it looks like in VBE and Notepad++

  1. Screenshot in VBE: image

  2. Screenshot in Notepad++ after copy&paste: image

GCuser99 commented 1 year ago

I found this this morning from here:

Under certain conditions, the GetWindowTextLength function may return a value that is larger than the actual length of the text. This occurs with certain mixtures of ANSI and Unicode, and is due to the system allowing for the possible existence of double-byte character set (DBCS) characters within the text. The return value, however, will always be at least as large as the actual length of the text; you can thus always use it to guide buffer allocation.

The existence of those 6 non-ascii characters probably explains why the buffer is larger than len(caption & vbNullChar). So probably have to do something like this...

caption = Left$(caption, Instr(caption, vbNullChar) - 1)
6DiegoDiego9 commented 1 year ago

Perfect!

Interestingly, I've got that same solution from an AI, without even mentioning the problem: image

@GHRyunosuke let us know if it works for you. I can't think of a better solution than that.

GCuser99 commented 1 year ago

Perfect!

Interestingly, I've got that same solution from an AI, without even mentioning the problem: image

@GHRyunosuke let us know if it works for you. I can't think of a better solution than that.

Thanks for finding that @6DiegoDiego9 - the generative AI is a game changer, and kudos to you for embracing early. I'm gonna have to follow your example and use it more....

GCuser99 commented 1 year ago

I went ahead and made the commit because I wanted to get those fatal errors out of the code - thanks to you both for getting this sorted out...

@GHRyunosuke, in case you just want to plug the changes into your current project, here is the relevant code that should get you started up again. Please let @6DiegoDiego9 and I know how it goes and keep providing feedback - it's very helpful!

Private Function activeVBAProjectFolderPath() As String
    'This returns the calling code project's parent document path. So if caller is from a project that references the SeleniumVBA Add-in
    'then this returns the path to the caller, not the Add-in (unless they are the same).
    'But be aware that if qc'ing this routine in Debug mode, the path to this SeleniumVBA project will be returned, which
    'may not be the caller's intended target if it resides in a different project.
    Dim fso As FileSystemObject
    Dim oApp As Object

    Set oApp = Application 'late bound needed for cross-app compatibility

    Select Case oApp.Name
    Case "Microsoft Excel"
        Dim sRespType As String
        sRespType = TypeName(oApp.Caller)
        If sRespType <> "Error" Then 'eg. if launched by a formula or a shape button in a worksheet
            activeVBAProjectFolderPath = oApp.ActiveWorkbook.Path
        Else 'if launched in the VBE
            If vbaIsTrusted Then
                'below will return an error if active project's host doc has not yet been saved, even if access trusted
                Set fso = New FileSystemObject
                On Error Resume Next
                activeVBAProjectFolderPath = fso.GetParentFolderName(oApp.VBE.ActiveVBProject.fileName)
                On Error GoTo 0
            Else 'if Excel security setting "Trust access to the VBA project object model" is not enabled
                Dim ThisAppProcessID As Long
                GetWindowThreadProcessId oApp.hWnd, ThisAppProcessID
                Do 'search for this VBE window
                    Dim hWnd As LongPtr
                    hWnd = FindWindowEx(0, hWnd, "wndclass_desked_gsk", vbNullString)
                    If hWnd > 0 Then
                        Dim WndProcessID As Long
                        GetWindowThreadProcessId hWnd, WndProcessID
                        If ThisAppProcessID = WndProcessID Then
                            'get its caption
                            Dim bufferLen As Long, caption As String, result As Long
                            bufferLen = GetWindowTextLength(hWnd)
                            caption = Space$(bufferLen + 1)
                            result = GetWindowText(hWnd, caption, bufferLen + 1)
                            'extract filename from the caption
                            Dim oRegex As New RegExp
                            oRegex.Pattern = "^Microsoft Visual Basic[^-]* - (.*?)(?:| \[[^]]*\]) - \[[^]]*\]$"
                            Dim regexRes As MatchCollection
                            caption = Left$(caption, InStr(caption, vbNullChar) - 1)
                            Set regexRes = oRegex.execute(caption)
                            If regexRes.Count = 1 Then
                                Dim sFilename As String
                                sFilename = regexRes.Item(0).SubMatches(0)
                                'this returns vbNullString if workbook has not been saved yet
                                activeVBAProjectFolderPath = oApp.Workbooks(sFilename).Path
                            Else
                                Err.raise 1, , "Error: unable to extract filename from VBE window caption. Check the extraction regex."
                            End If
                        End If
                    End If
                Loop Until hWnd = 0
            End If
        End If
        If activeVBAProjectFolderPath = vbNullString Then Err.raise 1, , "Error: unable to get the active VBProject path - make sure the parent document has been saved."
    Case "Microsoft Access"
        Dim strPath As String

        strPath = vbNullString

        'if the parent document holding the active vba project has not yet been saved, then Application.VBE.ActiveVBProject.Filename
        'will throw an error so trap and report below...

        On Error Resume Next
        strPath = oApp.VBE.ActiveVBProject.fileName
        On Error GoTo 0

        If strPath <> vbNullString Then
            Set fso = New FileSystemObject
            strPath = fso.GetParentFolderName(strPath)
            activeVBAProjectFolderPath = strPath
        Else
            Err.raise 1, "WebShared", "Error: Attempting to reference a folder/file path relative to the parent document location of this active code project - save the parent document first."
        End If
    Case Else
        Err.raise 1, "WebShared", "Error: Only MS Access and MS Excel supported."
    End Select
End Function
GCuser99 commented 1 year ago

I'll close this now but can resurrect if problems persist...

GHRyunosuke commented 1 year ago

Hello @GCuser99, @6DiegoDiego9, with the code provided above ("activeVBAProjectFolderPath"), I confirmed this issue is fixed! Thanks a lot for your effort.

GHRyunosuke commented 1 year ago

Hello @6DiegoDiego9, do you think you could share the link of the "AI" you used? It seems interesting and I would like to try it.

6DiegoDiego9 commented 1 year ago

Hi @GHRyunosuke, that one is www.phind.com, that is likely powered by GPT 3.5 with a pre-prompt tailored for developers.

There are also currently 4-5 other AIs that I’m using with GPT 4.0 that are significantly better at coding but 2-3 times slower and not free of charge.

GHRyunosuke commented 1 year ago

Hi @6DiegoDiego9,

Haven't used any GPT, very interesting. Thanks a lot for the information.