Closed JuroOravec closed 10 months ago
Bun is not fully compatible with Node. See https://bun.sh/docs/runtime/nodejs-apis#node-net, where they write:
If you run into any bugs with a particular package, please open an issue. Opening issues for compatibility bugs helps us prioritize what to work on next.
So I'd recommend doing that, we can't fix it here...
@jancurn Please don't judge so fast and have a look at the error I posted.
The error said that undefined is not an object (evaluating 'net_1.default.Socket.prototype.write')
.
But in my test, the (new Socket()).write
function was defined in Bun. So it didn't seem to be an issue on the Bun side, implying that the issue is in proxy-chain
.
What's more, I think I just found the issue, and it's here this line 5:
const asyncWrite = promisify(net.Socket.prototype.write);
Which is then called here on line 14
await asyncWrite.call(socket, 'HTTP/1.1 200 Connection Established\r\n\r\n');
For some reason, net.Socket.prototype
is undefined
in Bun, so net.Socket.prototype.write
throws the error.
However, (new net.Socket()).write
is defined, and following:
new Socket().write('HTTP/1.1 200 Connection Established\r\n\r\n')
returns true
.
So that's what I think the issue is. However, I haven't worked with Sockets before, and I'm not 100% sure what's the prurpose of that file, so I don't know if the behaviour of new net.Socket().write
in Bun is the same as net.Socket.prototype.write
in Node. But common sense suggests that it should be.
Further updates:
Locally, I've replaced net.Socket.prototype.write
with new net.Socket().write
, and proxy-chain
wasn't causing errors anymore.
Next up, there was an error with node_modules/@crawlee/browser-pool/proxy-server.js
with line
server.server.unref();
I looked into it. The unref
should refers to http.Server.unref
. For some reason, this isn't define in Bun, and this seems to be genuine error on their side (it's not even reported in their docs).
Out of curiosity, I just commented out that line, to see if I get the crawler to work. It printed the initial log with system info
INFO System info
{"apifyVersion":"3.1.4","apifyClientVersion":"2.7.1","crawleeVersion":"3.3.1","osType":"Darwin","nodeVersion":"v18.15.0"}
However, the run still ended in an error. Here, the promises_1.opendir
refer to fs.promises.opendir
(node:fs). Unfortunately, none of the opendir
functions are currently defined Bun (fs.opendirSync
, fs.opendir,
fs.promises.opendir`).
ERROR (0, promises_1.opendir) is not a function. (In '(0, promises_1.opendir)(keyValueStoreDir)', '(0, promises_1.opendir)' is undefined)
TypeError: (0, promises_1.opendir) is not a function. (In '(0, promises_1.opendir)(keyValueStoreDir)', '(0, promises_1.opendir)' is undefined)
at <anonymous> (/Users/presenter/repos/apify-actor-facebook/node_modules/@crawlee/memory-storage/cache-helpers.js:110:25)
So to sum up:
net.Socket.prototype.write
with new net.Socket().write
.Sorry, you're right. I think we just need to get rid of the problematic line and change the code of the customConnect
function in https://github.com/apify/proxy-chain/blob/master/src/custom_connect.ts to something like this:
const asyncWrite = util.promisify(socket.write).bind(socket);
await asyncWrite.call(socket, 'HTTP/1.1 200 Connection Established\r\n\r\n');
Would you care to create a pull request?
I made a PR for the socket one (https://github.com/apify/proxy-chain/pull/522), since I'm already in the flow. Couldn't verify the tests. I leave it up to you to decide whether it should go in or not. Have a nice evening!
I couldn't resist testing further, so just summarizing what I learnt:
I managed to get start a Playwright crawler in Bun with following changes to the Apify packages:
server.server.unref();
in @crawlee/browser-pool/proxy-server.js
fs.promises.opendir(dirName)
with fs.promises.readdir(dirName, { withFileTypes: true })
in @crawlee/memory-storage/cache-helpers.js
withFileTypes: true
option, both opendir
and readdir
resolve to an iterable of Dirent. Bad thing, from my understanding opendir
yields the entries one-by-one as they are found, whereas readdir
resolves only once all items have been found. So replacing opendir
with readdir
might add extra waiting time.With changes in step 1., I managed to start a Playwright crawler, to the point where Playwright command was executed. Afterwards, there is an issue on Playwright side with child_process.spawn
. You can find more about that issue here:
Many thanks for the analysis! Please can you post this to https://github.com/apify/crawlee/issues instead? Otherwise the Crawlee team will not look into it...
Closing this issue here for now
Hi, I have a web scraper built on top of the Crawlee framework. I wanted to run it with Bun instead of Node. However, it failed, and the stack trace (at the bottom) led to
proxy-chain
.I don't think the issue is with Bun. I tried running the following script with Bun, and both
new Socket()
andsocket.write
were defined:Error stack trace:
ENV: macOS 13.2.1 node: v16.13.0
Dependencies: "proxy-chain@2.3.0" "apify": "^3.1.4", "apify-client": "^2.7.1", "cheerio": "^1.0.0-rc.12", "crawlee": "^3.3.1",