Closed shtse8 closed 2 years ago
This would still require multiple requests but can be done to not require an initialization request for every string.
Okay implemented in v10.3.0
it seems the typing of return is not correct if the input is array.
the return type is still string
.
the requests are sending concurrently without any limitation. if I batch translate 100 texts, 100 requests are being sent at the same time. it's better to limit the speed to about 5 requests at a time using async
One suggestion, it's better to give user more control by building the first request, and then we can translate each text one by one.
const translator = new googleTranslator()
for (const text of textArray) {
// build the first header request and cache it to the instance on the first translate
let translatedText = await translator.translate(text, { to: 'zh-TW' })
// ...
}
So we can control the translate logic and speed more flexible.
I made my version to use it in my project, not for publish. hope it can help.
import axios from 'axios'
export class GoogleTranslator {
private data: {} | undefined
private host = 'https://translate.google.com'
private rpcids = 'MkEWBc'
async getData() {
if (this.data)
return this.data
const response = await axios.get(this.host)
this.data = {
'rpcids': this.rpcids,
'source-path': '/',
'f.sid': this.extract('FdrFJe', response.data),
'bl': this.extract('cfb2h', response.data),
'hl': 'en-US',
'soc-app': 1,
'soc-platform': 1,
'soc-device': 1,
'_reqid': Math.floor(1000 + (Math.random() * 9000)),
'rt': 'c',
}
return this.data
}
extract(key: string, res: string) {
const re = new RegExp(`"${key}":".*?"`)
const result = re.exec(res)
if (result !== null)
return result[0].replace(`"${key}":"`, '').slice(0, -1)
return ''
}
async translate(text: string, to: string) {
const data = await this.getData()
// console.log('data', data)
const queryParams = new URLSearchParams(data)
const from = 'auto'
const autoCorrect = false
const url = `${this.host}/_/TranslateWebserverUi/data/batchexecute?${queryParams.toString()}`
const freq = [[[this.rpcids, JSON.stringify([[text, from, to, autoCorrect], [null]]), null, 'generic']]]
const body = `f.req=${encodeURIComponent(JSON.stringify(freq))}&`
// console.log('url', url)
// console.log('body', body)
const response = await axios.post(url, body, {
headers: {
'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8',
},
})
// console.log('response.data', response.data)
// remove ")]}'\n\n"
let json = response.data.slice(6)
// console.log('json', json)
const result = {
text: '',
pronunciation: '',
from: {
language: {
didYouMean: false,
iso: '',
},
text: {
autoCorrected: false,
value: '',
didYouMean: false,
},
},
raw: '',
}
try {
const lengthStr = /^\d+/.exec(json)?.[0]
if (!lengthStr)
throw new Error('Missing length.')
json = JSON.parse(json.slice(lengthStr.length, lengthStr.length + parseInt(lengthStr, 10)))
json = JSON.parse(json[0][2])
result.raw = json
}
catch (e) {
return result
}
if (json[1][0][0][5] === undefined || json[1][0][0][5] === null) {
// translation not found, could be a hyperlink or gender-specific translation?
result.text = json[1][0][0][0]
}
else {
result.text = json[1][0][0][5]
.map((obj: [string]) => {
return obj[0]
})
.filter(Boolean)
// Google api seems to split text per sentences by <dot><space>
// So we join text back with spaces.
// See: https://github.com/vitalets/google-translate-api/issues/73
.join(' ')
}
result.pronunciation = json[1][0][0][1]
// From language
if (json[0] && json[0][1] && json[0][1][1]) {
result.from.language.didYouMean = true
result.from.language.iso = json[0][1][1][0]
}
else if (json[1][3] === 'auto') {
result.from.language.iso = json[2]
}
else {
result.from.language.iso = json[1][3]
}
// Did you mean & autocorrect
if (json[0] && json[0][1] && json[0][1][0]) {
let str = json[0][1][0][0][1]
str = str.replace(/<b>(<i>)?/g, '[')
str = str.replace(/(<\/i>)?<\/b>/g, ']')
result.from.text.value = str
if (json[0][1][0][2] === 1)
result.from.text.autoCorrected = true
else
result.from.text.didYouMean = true
}
return result
}
}
I found that, google request does support batch translate without sending multiple request one by one
change
const freq = [[[this.rpcids, JSON.stringify([[text, from, to, autoCorrect]]), null, 'generic']]]
to
const freq = [[[this.rpcids, JSON.stringify([[text1, from, to, autoCorrect]]), null, 'generic'], [this.rpcids, JSON.stringify([[text2, from, to, autoCorrect]]), null, 'generic']]]
Interesting, I hadn't experimented with it before, but I will look into it now, and implement it in the project.
It also gives me compilation errors (I excluded node_modules in tsconfig).
node_modules/google-translate-api-x/index.d.ts:6:20 - error TS2304: Cannot find name 'AxiosRequestConfig'.
6 requestOptions?: AxiosRequestConfig
node_modules/google-translate-api-x/index.d.ts:17:23 - error TS2552: Cannot find name 'fuction'. Did you mean 'Function'?
17 requestFunction?: fuction | string;
node_modules/google-translate-api-x/index.d.ts:42:5 - error TS2300: Duplicate identifier '"auto"'. 42 "auto" = "Automatic",
node_modules/google-translate-api-x/index.d.ts:43:5 - error TS2300: Duplicate identifier '"auto"'.
43 "auto" = "Detect language",
Found 4 errors in the same file, starting at: node_modules/google-translate-api-x/index.d.ts:6
This would still require multiple requests but can be done to not require an initialization request for every string.
Why not join input strings using PUA codes and then split the response back? In this case only 1 request would be necessary.
@Zombobot1 That was a separate issue as I never actually tested the type definitions and am not experienced with Typescript, should be fixed in 10.3.4
Why not join input strings using PUA codes and then split the response back? In this case only 1 request would be necessary.
Because there is no guarantee or indication that Google Translate would interpret each input string separated by PUAs as independent, at least that I know of
I have a working version which running for a week with batch translation. Do you need my version?
This version support Multi translated target with better performance https://github.com/lyan-ap/google-translate-api-next
@AidanWelch I updated to this version and now I have this error:
Type 'typeof import("/functions/node_modules/google-translate-api-x/index")' has no call signatures
Code:
import translate from 'google-translate-api-x'
const translation = await translate(texts, { from: 'en', to: lang, autoCorrect: true })
I translate some german text from pdfs to english using Google Translate from time to time. When I paste a paragraph of text from a pdf it contains '\n' on the end of every line. It is enough to break translations as if I pasted it line by line. When I remove all these new lines I get a proper translation.
I mean for some applications it is not critical to translate all strings at once even if some of them may affect others. However, I would like to avoid being banned by google api due to many responses. I don't know how many responses we can make using this library and how difficult to solve this problem using proxies, so this is just a suggestion 🙂
I have a working version which running for a week with batch translation. Do you need my version?
I would definitely appreciate it if you submitted a PR!
I have a working version which running for a week with batch translation. Do you need my version?
I would definitely appreciate it if you submitted a PR!
I don't have time to submit a PR as I am working on a urgent project. I can provide my code here for reference code on how batch translating works with google api.
// googleTranslator.ts
import consola from 'consola'
import axios from 'axios'
import { mapBatchAsync } from '~/kit/array'
import { arrayToObject } from '~/kit/type'
export interface TranslateResult {
text: string
pronunciation: string
from: {
language: {
didYouMean: boolean
iso: string
}
text: {
autoCorrected: boolean
value: string
didYouMean: boolean
}
}
}
export class GoogleTranslator {
private data: {} | undefined
private host = 'https://translate.google.com'
private rpcId = 'MkEWBc'
async getData() {
if (this.data)
return this.data
try {
const response = await axios.get(this.host, {
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.124 Safari/537.36 Edg/102.0.1245.44',
},
})
this.data = {
'rpcids': this.rpcId,
'source-path': '/',
'f.sid': this.extract('FdrFJe', response.data),
'bl': this.extract('cfb2h', response.data),
'hl': 'en-US',
'soc-app': 1,
'soc-platform': 1,
'soc-device': 1,
'_reqid': Math.floor(1000 + (Math.random() * 9000)),
'rt': 'c',
}
return this.data
}
catch (e) {
consola.error(e)
throw e
}
}
extract(key: string, res: string) {
const re = new RegExp(`"${key}":".*?"`)
const result = re.exec(res)
if (result !== null)
return result[0].replace(`"${key}":"`, '').slice(0, -1)
return ''
}
parseResult(json: any): TranslateResult {
const result: TranslateResult = {
text: '',
pronunciation: '',
from: {
language: {
didYouMean: false,
iso: '',
},
text: {
autoCorrected: false,
value: '',
didYouMean: false,
},
},
}
if (!json)
throw new Error('json is empty.')
// console.log(inspect(json, { showHidden: false, depth: null, colors: true }))
if (json[1][0][0][5] === undefined || json[1][0][0][5] === null) {
// translation not found, could be a hyperlink or gender-specific translation?
result.text = json[1][0][0][0]
}
else {
result.text = json[1][0][0][5]
.map((obj: [string]) => {
return obj[0]
})
.filter(Boolean)
// Google api seems to split text per sentences by <dot><space>
// So we join text back with spaces.
// See: https://github.com/vitalets/google-translate-api/issues/73
.join(' ')
}
result.pronunciation = json[1][0][0][1]
// From language
if (json[0] && json[0][1] && json[0][1][1]) {
result.from.language.didYouMean = true
result.from.language.iso = json[0][1][1][0]
}
else if (json[1][3] === 'auto') {
result.from.language.iso = json[2]
}
else {
result.from.language.iso = json[1][3]
}
result.from.text.value = json[1][4][0]
return result
}
async translate(textArray: string[], to: string) {
return await mapBatchAsync(textArray, x => !!x.trim(), async (filterTextArray): Promise<TranslateResult[]> => {
const data = await this.getData()
// console.log('data', data)
const queryParams = new URLSearchParams(data)
const from = 'auto'
const autoCorrect = false
const url = `${this.host}/_/TranslateWebserverUi/data/batchexecute?${queryParams.toString()}`
const freq = [
filterTextArray.map((x, i) => [this.rpcId, JSON.stringify([[x, from, to, autoCorrect, i]]), null, 'generic']),
]
// console.log('freq', freq)
const body = `f.req=${encodeURIComponent(JSON.stringify(freq))}&`
// console.log('url', url)
// console.log('body', body)
const response = await axios.post(url, body, {
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.124 Safari/537.36 Edg/102.0.1245.44',
'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8',
},
})
// console.log(textArray)
// console.log(response.data)
// remove ")]}'\n\n"
let responseData = response.data.slice(6)
const rpcResponses = []
while (true) {
// console.log('json', json)
const lengthStr = /^\d+/.exec(responseData)?.[0]
if (!lengthStr)
break
responseData = responseData.slice(lengthStr.length)
const contentStr = responseData.slice(0, parseInt(lengthStr, 10))
responseData = responseData.slice(contentStr.length)
// x[1]: rpc id
// x[2]: payload
// x[5]: error, [3] probably translating empty string
rpcResponses.push(...JSON.parse(contentStr).filter((x: any) => x[1] === this.rpcId))
}
// fix the ordering
const resultMap = arrayToObject(rpcResponses.map(x => this.parseResult(JSON.parse(x[2]))), x => x.from.text.value)
return filterTextArray.map(x => resultMap[x])
}, async x => x.map(y => (<TranslateResult>{
text: y,
pronunciation: '',
from: {
language: {
didYouMean: false,
iso: 'unknown',
},
text: {
autoCorrected: false,
value: y,
didYouMean: false,
},
},
})))
}
}
// MapBatchAsync
export async function mapBatchAsync<T, TR, FR = T>(src: T[], filter: (i: T) => boolean, trueFn: (i: T[]) => Promise<TR[]>, falseFn?: (i: T[]) => Promise<FR[]>): Promise<(TR | FR)[]> {
const newSrc = [...src] as (T | TR | FR)[]
const [trueTargets, falseTargets] = SplitBy(src.map((x, i) => ({ i, x })), ({ x }) => filter(x))
if (trueTargets.length > 0) {
const results = await trueFn(trueTargets.map(x => x.x))
for (const index in results)
newSrc[trueTargets[index].i] = results[index]
}
if (falseFn && falseTargets.length > 0) {
const results = await falseFn(falseTargets.map(x => x.x))
for (const index in results)
newSrc[falseTargets[index].i] = results[index]
}
return newSrc as (TR | FR)[]
}
// arrayToObject
export function arrayToObject<T, R = T>(arr: T[], keyFn: (item: T) => PropertyKey, valueFn?: (item: T) => R): Record<PropertyKey, R> {
const vFn = valueFn ?? (item => item as unknown as R)
return Object.fromEntries(arr.map(x => [keyFn(x), vFn(x)]))
}
I have a working version which running for a week with batch translation. Do you need my version?
I would definitely appreciate it if you submitted a PR!
I don't have time to submit a PR as I am working on a urgent project. I can provide my code here for reference code on how batch translating works with google api.
// googleTranslator.ts import consola from 'consola' import axios from 'axios' import { mapBatchAsync } from '~/kit/array' import { arrayToObject } from '~/kit/type' export interface TranslateResult { text: string pronunciation: string from: { language: { didYouMean: boolean iso: string } text: { autoCorrected: boolean value: string didYouMean: boolean } } } export class GoogleTranslator { private data: {} | undefined private host = 'https://translate.google.com' private rpcId = 'MkEWBc' async getData() { if (this.data) return this.data try { const response = await axios.get(this.host, { headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.124 Safari/537.36 Edg/102.0.1245.44', }, }) this.data = { 'rpcids': this.rpcId, 'source-path': '/', 'f.sid': this.extract('FdrFJe', response.data), 'bl': this.extract('cfb2h', response.data), 'hl': 'en-US', 'soc-app': 1, 'soc-platform': 1, 'soc-device': 1, '_reqid': Math.floor(1000 + (Math.random() * 9000)), 'rt': 'c', } return this.data } catch (e) { consola.error(e) throw e } } extract(key: string, res: string) { const re = new RegExp(`"${key}":".*?"`) const result = re.exec(res) if (result !== null) return result[0].replace(`"${key}":"`, '').slice(0, -1) return '' } parseResult(json: any): TranslateResult { const result: TranslateResult = { text: '', pronunciation: '', from: { language: { didYouMean: false, iso: '', }, text: { autoCorrected: false, value: '', didYouMean: false, }, }, } if (!json) throw new Error('json is empty.') // console.log(inspect(json, { showHidden: false, depth: null, colors: true })) if (json[1][0][0][5] === undefined || json[1][0][0][5] === null) { // translation not found, could be a hyperlink or gender-specific translation? result.text = json[1][0][0][0] } else { result.text = json[1][0][0][5] .map((obj: [string]) => { return obj[0] }) .filter(Boolean) // Google api seems to split text per sentences by <dot><space> // So we join text back with spaces. // See: https://github.com/vitalets/google-translate-api/issues/73 .join(' ') } result.pronunciation = json[1][0][0][1] // From language if (json[0] && json[0][1] && json[0][1][1]) { result.from.language.didYouMean = true result.from.language.iso = json[0][1][1][0] } else if (json[1][3] === 'auto') { result.from.language.iso = json[2] } else { result.from.language.iso = json[1][3] } result.from.text.value = json[1][4][0] return result } async translate(textArray: string[], to: string) { return await mapBatchAsync(textArray, x => !!x.trim(), async (filterTextArray): Promise<TranslateResult[]> => { const data = await this.getData() // console.log('data', data) const queryParams = new URLSearchParams(data) const from = 'auto' const autoCorrect = false const url = `${this.host}/_/TranslateWebserverUi/data/batchexecute?${queryParams.toString()}` const freq = [ filterTextArray.map((x, i) => [this.rpcId, JSON.stringify([[x, from, to, autoCorrect, i]]), null, 'generic']), ] // console.log('freq', freq) const body = `f.req=${encodeURIComponent(JSON.stringify(freq))}&` // console.log('url', url) // console.log('body', body) const response = await axios.post(url, body, { headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.124 Safari/537.36 Edg/102.0.1245.44', 'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8', }, }) // console.log(textArray) // console.log(response.data) // remove ")]}'\n\n" let responseData = response.data.slice(6) const rpcResponses = [] while (true) { // console.log('json', json) const lengthStr = /^\d+/.exec(responseData)?.[0] if (!lengthStr) break responseData = responseData.slice(lengthStr.length) const contentStr = responseData.slice(0, parseInt(lengthStr, 10)) responseData = responseData.slice(contentStr.length) // x[1]: rpc id // x[2]: payload // x[5]: error, [3] probably translating empty string rpcResponses.push(...JSON.parse(contentStr).filter((x: any) => x[1] === this.rpcId)) } // fix the ordering const resultMap = arrayToObject(rpcResponses.map(x => this.parseResult(JSON.parse(x[2]))), x => x.from.text.value) return filterTextArray.map(x => resultMap[x]) }, async x => x.map(y => (<TranslateResult>{ text: y, pronunciation: '', from: { language: { didYouMean: false, iso: 'unknown', }, text: { autoCorrected: false, value: y, didYouMean: false, }, }, }))) } }
// MapBatchAsync export async function mapBatchAsync<T, TR, FR = T>(src: T[], filter: (i: T) => boolean, trueFn: (i: T[]) => Promise<TR[]>, falseFn?: (i: T[]) => Promise<FR[]>): Promise<(TR | FR)[]> { const newSrc = [...src] as (T | TR | FR)[] const [trueTargets, falseTargets] = SplitBy(src.map((x, i) => ({ i, x })), ({ x }) => filter(x)) if (trueTargets.length > 0) { const results = await trueFn(trueTargets.map(x => x.x)) for (const index in results) newSrc[trueTargets[index].i] = results[index] } if (falseFn && falseTargets.length > 0) { const results = await falseFn(falseTargets.map(x => x.x)) for (const index in results) newSrc[falseTargets[index].i] = results[index] } return newSrc as (TR | FR)[] }
// arrayToObject export function arrayToObject<T, R = T>(arr: T[], keyFn: (item: T) => PropertyKey, valueFn?: (item: T) => R): Record<PropertyKey, R> { const vFn = valueFn ?? (item => item as unknown as R) return Object.fromEntries(arr.map(x => [keyFn(x), vFn(x)])) }
Some key points here.
/_/TranslateWebserverUi/data/batchexecute
is an api endpoint of google to accept batch requests.freq
.SplitBy
is missing. I just used Lodash's _.partition
. Works well.
It would be nice to be able to assign a stable id to each translation to map up with my own logic in future. Currently I use the from.text
field, but if didYouMean
returns true, does this change?`
[
{id: 10, title: {en: 'hello', de: null}}, <-- need to fill in the DE translation after my batch returns.
{id: 22, title: {en: 'how are you', de: null}},
]
Ideally I would pass a key like: 10.title
/ 22.title
along with the src text.
Ideally I would pass a key like:
10.title
/22.title
along with the src text.
In the implementation I preliminarily created with object input you can simply use an object key with values of each translation text, so I plan to carry that over with future implementations.
Currently I use the from.text field, but if
didYouMean
returns true, does this change?`
Yes, but that is not the only time apparently, as I just noticed an error with the README investigating that.
Note these two cases:
const res = await translate('I spea Dutch!', { from: 'en', to: 'nl', autoCorrect: true });
console.log(res.from.text.didYouMean); // => false
console.log(res.from.text.autoCorrected); // => true
console.log(res.from.text.value); // => 'I [speak] Dutch!'
console.log(res.text); // => 'Ik spreek Nederlands!'
opts.autoCorrect
is set to true, and so therefore it does not return true for res.from.text.didYouMean
and instead returns true for res.from.text.autoCorrected
. And, it adds the auto-corrected word in brackets in res.from.text.value
. And, the result is using the auto-corrected value.
const res = await translate('I spea Dutch!', { from: 'en', to: 'nl', autoCorrect: false });
console.log(res.from.text.didYouMean); // => true
console.log(res.from.text.autoCorrected); // => false
console.log(res.from.text.value); // => 'I [speak] Dutch!'
console.log(res.text); // => 'Ik speed Nederlands!'
When opts.autoCorrect
is set to false, instead of setting res.from.text.autoCorrected
to true it sets res.from.text.autoCorrected
to true. And, does not use the predicted correction in the translation, however:
res.from.text.value
to include a bracketed correction!2 new discovers
[this.rpcId, JSON.stringify([[x, from, to, autoCorrect, i]]), null, 'generic']
generic
can be replaced by any text for recognizing the requests after batching which the order is not ensure.
rt=c
can be removed on the request url to make the response to be pure json. no need to parse using length.
without rt=c
which means return type = chunk
, you can use the follow code to process the response.
interface ExecuteResponse {
rpcId: string
payload: object
name: string
}
processResponse(response: string): ExecuteResponse[] {
return JSON.parse(response.slice(6))
.filter((x: any) => x[1] === this.rpcId && x[2])
.map(x => (<ExecuteResponse>{
rpcId: x[1],
payload: JSON.parse(x[2]),
name: x[6],
}))
}
Sorry, code is messed. just want to share some discovery. but not interested in PR or publishing package.
@shtse8 Its fine, I appreciate it and will work on implementing!
Currently I use the
from.text
field, but ifdidYouMean
returns true, does this change?`
I updated the README to hopefully clarify this
@shtse8 Okay, v10.4.0 implements this!
I want to pass an array of string for translation. Could you help support it?