Closed metalwarrior665 closed 2 years ago
Something like this would fix it
let url = originalUrl;
const match = 'https://docs.google.com/spreadsheets/d/11UGSBOSXy5Ov2WEP9nr4kSIxQJmH18zh-5onKtBsovU/edit?usp=sharing'
.match(/^(https:\/\/docs\.google\.com\/spreadsheets\/d\/(?:\w|-)+)\/edit/);
if (match) {
url = `${match[1]}/gviz/tq?tqx=out:csv`
}
You can pass a custom RegExp
into the function. I don't feel like hardcoding Google Docs specific overrides into the function. Am I missing something?
Well, in Scrapers and other generic actors that use this internally, the user can pass anything there (usually into the Start URLs input schema component) so creating custom regex doesn't make sense.
I will keep this issue open and observe if more people get to the same problem and if yes ,we should at least enhance the description/warning for the Start URLs file upload.
Oh, so the trouble is actually with the automatic parsing in RequestList
. Yeah, well, that would deserve some update.
Hey, it takes the input from the url at the moment, but it also takes a lot of unrelated google urls, see here: https://console.apify.com/admin/users/xRGg9iAfJSymqartk/tasks/eaUCBXOfaYgzwAcDB#/runs/Lo4IEhIFzEpNfBOtS . @B4nan
Reproduce with:
In the wrong case, it will print a lot of internal Google URLs. Actually, the wrong URL is what you get if you click on the Share button in your spreadsheet.
I think we could probably just convert the URL without touching the parsing.