aw1875 / puppeteer-hcaptcha

A library to solve hcaptcha challenges that are automated within puppeteer. You can automatically set response values where they should be so the only thing left for you is submitting the page or you can get the response token.
https://www.npmjs.com/package/puppeteer-hcaptcha
137 stars 39 forks source link

[Bug]: Incompatible with Heroku #41

Closed coreynorthcutt closed 2 years ago

coreynorthcutt commented 2 years ago

Describe the bug "Bug" may be inaccurate, but the closest fit.

Heroku limits the compressed slug size to 500MB (a year ago it was 300MB). With Puppeteer, this clocks in at 635 MB.

To Reproduce Steps to reproduce the behavior:

  1. Deploy to Heroku
  2. Read error messages
  3. Cry

Example:

du -sh .[^.]* * | sort -hr

1.2G    node_modules
 69M    tmp
520K    .git
212K    package-lock.json
 84K    yarn.lock
 16K    index.js
 12K    .DS_Store
4.0K    preload.js
4.0K    package.json
4.0K    README.md
4.0K    Procfile
4.0K    .slugignore
4.0K    .gitignore
4.0K    .env

and

695M    @tensorflow
282M    puppeteer-hcaptcha
249M    puppeteer
9.5M    core-js
4.9M    lodash
2.3M    devtools-protocol
2.1M    @types
1.1M    ajv
1.0M    puppeteer-extra-plugin-stealth
804K    google-protobuf
640K    @mapbox
584K    uri-js
580K    puppeteer-extra-plugin-user-data-dir
440K    psl
436K    seedrandom
424K    request
420K    yargs
396K    iconv-lite
380K    es6-promise
364K    express
292K    sshpk
276K    are-we-there-yet
272K    tr46
260K    @tensorflow-models
256K    semver
248K    tar
232K    bezier-js
224K    fs-extra
216K    mime-db
212K    qs
200K    tweetnacl
196K    send
196K    body-parser
188K    puppeteer-extra-plugin
188K    long
184K    argparse
180K    readable-stream
176K    puppeteer-extra
172K    ws
164K    node-fetch
164K    gauge
148K    yargs-parser
148K    unbzip2-stream
148K    adm-zip
144K    finalhandler
132K    form-data
128K    .yarn-integrity
120K    verror
108K    make-dir
104K    tough-cookie
 96K    minimist
 96K    buffer
 92K    dashdash
 92K    bl
 88K    har-schema
 84K    uuid
 84K    asynckit
 80K    sprintf-js
 80K    minipass
 80K    http-signature
 76K    yauzl
 72K    whatwg-url
 72K    mime
 72K    fast-json-stable-stringify
 68K    performance-now
 68K    jsbn
 64K    glob
 64K    ghost-cursor
 64K    emoji-regex
 60K    safer-buffer
 60K    dotenv
 60K    debug
 60K    agent-base
 56K    tar-fs
 56K    shallow-clone
 56K    jwt-decode
 56K    ipaddr.js
 56K    https-proxy-agent
 52K    depd
 52K    deepmerge
 52K    chalk
 48K    tar-stream
 48K    punycode
 48K    proxy-from-env
 48K    jsprim
 48K    graceful-fs
 48K    fd-slicer
 48K    extsprintf
 48K    ecc-jsbn
 48K    cliui
 48K    asn1
 44K    y18n
 44K    through
 44K    safe-buffer
 44K    regenerator-runtime
 44K    negotiator
 44K    minimatch
 44K    json-schema-traverse
 44K    fast-deep-equal
 44K    extend
 44K    color-convert
 40K    request-promise-core
 40K    json-stringify-safe
 40K    escalade
 40K    bcrypt-pbkdf
 40K    aws4
 36K    serve-static
 36K    raw-body
 36K    nopt
 36K    jsonfile
 36K    json-schema
 36K    isstream
 36K    core-util-is
 32K    yallist
 32K    type-is
 32K    rimraf
 32K    require-directory
 32K    progress
 32K    npmlog
 32K    mixin-object
 32K    isarray
 32K    detect-libc
 32K    delegates
 32K    content-disposition
 32K    ansi-styles
 28K    tunnel-agent
 28K    pump
 28K    proxy-addr
 28K    minizlib
 28K    mime-types
 28K    http-errors
 28K    get-stream
 28K    fs.realpath
 28K    forever-agent
 28K    extract-zip
 28K    cookie
 28K    console-control-strings
 28K    concat-map
 28K    combined-stream
 28K    caseless
 28K    aws-sign2
 28K    assert-plus
 28K    accepts
 24K    webidl-conversions
 24K    util-deprecate
 24K    string_decoder
 24K    stealthy-require
 24K    statuses
 24K    signal-exit
 24K    setprototypeof
 24K    request-promise-native
 24K    on-finished
 24K    oauth-sign
 24K    mkdirp
 24K    media-typer
 24K    lru-cache
 24K    har-validator
 24K    getpass
 24K    get-caller-file
 24K    fs-minipass
 24K    etag
 24K    delayed-stream
 24K    content-type
 24K    bytes
 24K    base64-js
 20K    wrap-ansi
 20K    vary
 20K    utils-merge
 20K    unpipe
 20K    toidentifier
 20K    supports-color
 20K    strip-ansi
 20K    string-width
 20K    set-blocking
 20K    range-parser
 20K    puppeteer-extra-plugin-user-preferences
 20K    pkg-dir
 20K    pend
 20K    path-to-regexp
 20K    path-exists
 20K    parseurl
 20K    p-try
 20K    p-locate
 20K    p-limit
 20K    methods
 20K    merge-descriptors
 20K    merge-deep
 20K    locate-path
 20K    kind-of
 20K    isobject
 20K    is-typedarray
 20K    is-plain-object
 20K    is-fullwidth-code-point
 20K    is-buffer
 20K    inherits
 20K    ieee754
 20K    has-flag
 20K    fs-constants
 20K    fresh
 20K    forwarded
 20K    find-up
 20K    encodeurl
 20K    cookie-signature
 20K    color-name
 20K    clone-deep
 20K    chownr
 20K    buffer-crc32
 20K    brace-expansion
 20K    balanced-match
 20K    ansi-regex
 16K    wrappy
 16K    wide-align
 16K    universalify
 16K    process-nextick-args
 16K    path-is-absolute
 16K    once
 16K    object-assign
 16K    number-is-nan
 16K    ms
 16K    mkdirp-classic
 16K    lazy-cache
 16K    is-extendable
 16K    inflight
 16K    has-unicode
 16K    for-own
 16K    for-in
 16K    escape-html
 16K    es6-promisify
 16K    end-of-stream
 16K    ee-first
 16K    destroy
 16K    code-point-at
 16K    array-flatten
 16K    arr-union
 16K    aproba
 16K    abbrev
8.0K    .DS_Store
  0B    .bin

Ideal solution appears to be to adapt this to Tensorflow sub-2.0, which is a fraction of the size. Or, trim out unused modules, if any... but TF is clearly the behemoth.

aw1875 commented 2 years ago

Unfortunately most of the modules cannot be removed as this package was just a simple way to solve captchas and wasn't really made with the intention of being small to be packaged for other products. I'm definitely interested in other solutions to image recognition as TF isn't exactly the fastest solution (was much faster before with Google Cloud Vision but they charge for that). If you have any suggestions for tools replacing TF I'm all ears (as I don't really want to deal ML myself).

aw1875 commented 2 years ago

Haven't heard anything about this for a few months so will be closing this issue. You can open a new issue if there is anything I can fix on my end.