Open takingurstuff opened 3 days ago
Before digging into this problem, I have 3 questions:
await scrollAndClick(...)
called in your code?contextId
property to Runtime.evaluate
? This would ensure the code is running in the main frame only. See for example: https://github.com/gildas-lormeau/single-file-cli/blob/4847a3d7da5d4e554e0e36626f05b29cf0ffdde3/lib/cdp-client.js#L175-L180.The code was not shown in the browser as I called it in the cdp client script not the api script. I edited the options as well inside of the api script. I could provide the edited cdp client code if that would help. The screenshots are what opened by itself during the last few pages. I will try to add the context ID to my function but I am guessing that a helper function that controls scrolling and clicking will cause browser to open a specific local url. Thank you for the suggestion.
Don't hesitate to share your code (ideally a repository that I could clone via git) if possible. This would be the easiest way for me to help you debug this problem.
it might also help to state that i am doing this edit on the latest version instead of the previous puppeteer version
i created a fork with the edited files: https://github.com/takingurstuff/single-file-cli
Thank you, I've cloned the repository on my machine and formatted the code to compare it with mine.
Before debugging it, could you confirm that the presence of const LOGIN_PAGE_URL = "https://bbs.quantclass.cn";
and await CDP.createTarget(LOGIN_PAGE_URL);
is intentional? I was not expecting to find this code and I'm not sure it's working as intended.
it is intentional, the code was there so i can open a new page that will not be used for downloading at all, all the downloading happens on new empty targets. It is working and i have not done any runtime evaluation on the page
just tested thru contextID, and it opened the script straight away instead of openeing it at the end of the download sequence: scroll and click function:
async function scrollAndClick (
Page,
Runtime,
primaryTargetSelector,
secondaryTargetSelector,
clickSelector,
buttonSelector,
scrollPause = 2000,
maxAttempts = 10000000,
contextId
) {
let attempts = 0
while (attempts < maxAttempts) {
attempts++
try {
const clickResult2 = await Runtime.evaluate({
expression: `!!document.querySelector('${buttonSelector}')`,
contextId
})
if (clickResult2.result.value) {
console.log('preparing to click')
click(buttonSelector, contextId)
}
// Check if the primary target element is present
const primaryTargetResult = await Runtime.evaluate({
expression: `!!document.querySelector('${primaryTargetSelector}')`,
contextId
})
if (primaryTargetResult.result.value) {
console.log('Primary target element found!')
return { found: true, target: 'primary' }
}
// Check if the secondary target element is present
if (secondaryTargetSelector) {
const secondaryTargetResult = await Runtime.evaluate({
expression: `!!document.querySelector('${secondaryTargetSelector}')`,
contextId
})
if (secondaryTargetResult.result.value) {
console.log('Secondary target element found!')
return { found: true, target: 'secondary' }
}
}
// Check if the click element is present
const clickResult = await Runtime.evaluate({
expression: `!!document.querySelector('${clickSelector}')`,
contextId
})
if (clickResult.result.value) {
console.log('Click element found, clicking it!')
await Runtime.evaluate({
expression: `document.querySelector('${clickSelector}').click()`,
contextId
})
// Pause after clicking
await new Promise(resolve => setTimeout(resolve, scrollPause))
}
// Scroll down
await Runtime.evaluate({
expression: 'window.scrollTo(0, document.body.scrollHeight)',
contextId
})
// Pause after scrolling
await new Promise(resolve => setTimeout(resolve, scrollPause))
} catch (error) {
console.error('Error during scroll and click:', error)
// Wait a bit before retrying
await new Promise(resolve => setTimeout(resolve, 1000))
}
}
console.log('Max attempts reached without finding either target element')
return { found: false }
}
async function click (button, contextId) {
try {
await Runtime.evaluate({
expression: `!!document.querySelector('${button}').click()`,
contextId
})
console.log('there is a button to click')
} catch (error) {
console.log('there are no buttons to click')
}
}
and the calling of the function:
if (options.scrollAndClickTarget && options.scrollAndClickButton) {
await scrollAndClick(
Page,
Runtime,
options.scrollAndClickTarget,
options.secondaryScrollAndClickTarget,
options.scrollAndClickButton,
options.nonScrollButtonSelector,
options.scrollPause || 2000,
options.scrollMaxAttempts || 100,
contextId
)
}
all the changes have been committed to the fork
I was recently crawling a simple site with this software when i encountered an issue of the software halting on the last few pages unxepectedly, the chances of this happeneing is also random as sometimes it downloads properluy but other times it halts completely. I was not using the built binary as i was making modifications:
To recreate:
first create an urls file with an odd number of links in it, then place the file into the repo:
then run the command in the cloned repo:
This issue does not occur in the compiled binaries the only modification to the source code that happened prior to this issue is the addition of smart scrolling:
this function is only called after the page is loaded fully