Closed metalwarrior665 closed 1 year ago
This comes from here:
https://github.com/apify/apify-shared-js/blob/master/packages/utilities/src/utilities.client.ts#L85
Not sure why we even have a custom URL parser instead of using new URL
, I guess it's just too old code we haven't refactored yet. Not sure if it makes more sense to try to fix the regexps or instead try to reimplement with new URL
? cc @mtrunkat @mnmkng
edit: looks like using new URL
(at least for the normalization) is trivial, working on a PR
There is one thing in the tests that won't be working exactly the same as before if we use new URL
.
it('should trim all parts of URL', () => {
expect(normalizeUrl(' http :// test # fragment ')).toEqual('http://test');
expect(normalizeUrl(' http :// test # fragment ', true)).toEqual('http://test#fragment');
});
Not sure why we had this in the tests, http :// test # fragment
is not a valid URL, the color character is part of the protocol:
new URL('http :// test # fragment'); // fails
new URL('http: // test # fragment'); // works
The code is 5 years old, and I guess it is rather testing the old normalizeUrl
implementation, so it should be safe to change that.
edit: hmm new URL('http: // test # fragment')
works in the browser, but not in nodejs, at least not on node 18 :/
I would be probably in favor of removing this test as a whole. We validate URLs internally in crawlee on many places via new URL
, so why should we care about anything that is not passing this validation here...
PR here: https://github.com/apify/apify-shared-js/pull/365
I stripped all the spaces from the URL before parsing to get around this, I guess that should do it.
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/core
Issue description
The Request constructor messes up the
uniqueKey
creation. It converts'https://www.google.com/maps/search/restaurant/@39.102972537998426,39.54835927707177,4z'
into'https://39.102972537998426,39.54835927707177,4z'
The first might be not up to HTTP specs but we should work around it. The same logic might exist on platform as well so we might need to fix it there as well
Code sample
Package version
latest
Node.js version
16
Operating system
No response
Apify platform
I have tested this on the
next
releaseNo response
Other context
No response