datawhores / OF-Scraper

A completely revamped and redesigned fork, reimagined from scratch based on the original onlyfans-scraper
MIT License
640 stars 53 forks source link

Post User Process not running consistently #444

Open Jakan-Kink opened 1 month ago

Jakan-Kink commented 1 month ago

Describe the bug

Using both version 3.11.1 and 3.11.2 with the new post* scripts I am running into two issues. First it isn't pulling in the scripts from the config file. I have to add extra code.

In get_post_download_script: https://github.com/datawhores/OF-Scraper/blob/625be8e5a33a4b854d9fc1e494e0204a9e8cd180/ofscraper/utils/config/data.py#L270-L280

I need to add:

    elif config.get("scripts", {}).get("post_download_script") is not None:
        val = config.get("scripts", {}).get("post_download_script")

and in get_post_script: https://github.com/datawhores/OF-Scraper/blob/625be8e5a33a4b854d9fc1e494e0204a9e8cd180/ofscraper/utils/config/data.py#L284-L294

I need to add:

    elif config.get("scripts", {}).get("post_script") is not None:
        val = config.get("scripts", {}).get("post_script")

But even then, the script doesn't run in 3.11.2

To Reproduce

Run ofscraper -u ALL -l DEBUG -p STATS -o all,labels -a download -d 120 -ts -up -st expired

Expected behavior

After every user the post_download_script command should fire, and at the end of the loop the post_script should fire.

Screenshots/Logs

With 3.11.2 The error I get for every performer is:

 2024-08-08 18:25:50:[level.inner:11]  expected str, bytes or os.PathLike object, not int
 2024-08-08 18:25:50:[level.inner:11]  Traceback (most recent call last):
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/runner/close/final/final_user.py", line 18, in post_user_process
    run(
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/system/subprocess.py", line 9, in run
    t=subprocess.run(*args, stdout=subprocess.PIPE,
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1885, in _execute_child
    self.pid = _fork_exec(
               ^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not int

Config

{
    "main_profile": "main_profile",
    "metadata": "{save_location}/meta/OnlyFans/{model_username}/Metadata",
    "discord": "",
    "file_options": {
        "save_location": "/Volumes/FileAccess/OnlyFans/",
        "dir_format": "sites/OnlyFans/{model_username}/{responsetype}/{value}/{mediatype}/",
        "file_format": "{date}-{filename}.{ext}",
        "textlength": 0,
        "space_replacer": " ",
        "date": "YYYY-MM-DD_HH-mm",
        "text_type_default": "letter",
        "truncation_default": true
    },
    "download_options": {
        "filter": [
            "Images",
            "Audios",
            "Videos",
            "Text"
        ],
        "auto_resume": false,
        "system_free_min": 0,
        "max_post_count": 0
    },
    "binary_options": {
        "ffmpeg": "/opt/homebrew/bin/ffmpeg"
    },
    "cdm_options": {
        "private-key": null,
        "client-id": null,
        "key-mode-default": "keydb",
        "keydb_api": "{redacted}"
    },
    "performance_options": {
        "download_sems": 6,
        "thread_count": 2,
        "download_limit": 0
    },
    "content_filter_options": {
        "block_ads": false,
        "file_size_max": 0,
        "file_size_min": 0,
        "length_max": null,
        "length_min": null
    },
    "advanced_options": {
        "code-execution": true,
        "dynamic-mode-default": "datawhores",
        "backend": "aio",
        "downloadbars": true,
        "cache-mode": "json",
        "appendlog": false,
        "custom_values": {
            "OLD_DEVIINT": "https://raw.githubusercontent.com/datawhores/onlyfans-dynamic-rules/new/dynamicRules.json",
            "XAGLER": "https://raw.githubusercontent.com/xagler/dynamic-rules/main/onlyfans.json",
            "RAFA": "https://raw.githubusercontent.com/rafa-9/dynamic-rules/main/rules.json",
            "DIGITALCRIMINALS": "https://raw.githubusercontent.com/DATAHOARDERS/dynamic-rules/main/onlyfans.json",
            "DATAWHORES": "https://raw.githubusercontent.com/datawhores/onlyfans-dynamic-rules/main/dynamicRules.json",
            "DEVIINT": "https://raw.githubusercontent.com/rafa-9/dynamic-rules/main/rules.json",
            "MAXFILE_SEMAPHORE": 10,
            "SHOW_AVATAR": false,
            "import": "exec('import ofscraper.filters.models.selector as selector23')",
            "list": "exec('modelObjs=C)')",
            "model_price": "'fallback' if len(modelObjs)==0 else 'Paid' if modelObjs[0].final_current_price>0 else 'Free'"
        },
        "sanitize_text": false,
        "temp_dir": null,
        "remove_hash_match": true,
        "infinite_loop_action_mode": false,
        "enable_auto_after": true,
        "default_user_list": "main",
        "default_black_list": ""
    },
    "scripts": {
        "post_download_script": "/Users/your_username/Development/of-scraper-post/post-user.sh",
        "post_script": "/Users/your_username/Development/of-scraper-post/post-loop.sh"
    },
    "responsetype": {
        "timeline": "Posts",
        "message": "Messages",
        "archived": "Archived",
        "paid": "Messages",
        "stories": "Stories",
        "highlights": "Stories",
        "profile": "Profile",
        "pinned": "Posts",
        "streams": "Streams"
    },
    "overwrites": {
        "audios": {},
        "videos": {},
        "images": {},
        "text": {
            "file_format": "{date}-{post_id}.{ext}"
        }
    }
}

System Info

Additional context

This happens on multiple OF accounts; here are some examples: couple_of_perverts, lilithinlatexxx, rubberdoll, lola-saint, sophie_x_elodie, trainingj, tightlacedchaos, doe-eyes-official

datawhores commented 1 month ago

I think it is because model_id needs to be converted into a string if not already one

Jakan-Kink commented 1 month ago

I forced model_id to string in final_user.py:

        run(
            [
                settings.get_post_download_script(),
                username,
                str(model_id),
                json.dumps(media_dump),
                json.dumps(post_dump),
                json.dumps(master_dump),
            ]
        )

and it did change the error message

 2024-08-08 20:47:57:[final_user.post_user_process:13]  Running post script for lilithinlatexxx
 2024-08-08 20:47:58:[level.inner:11]  [Errno 7] Argument list too long: '/Users/your_username/Development/of-scraper-post/post-user.sh'
 2024-08-08 20:47:58:[level.inner:11]  Traceback (most recent call last):
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/runner/close/final/final_user.py", line 24, in post_user_process
    run(
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/system/subprocess.py", line 9, in run
    t=subprocess.run(*args, stdout=subprocess.PIPE,
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1955, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: '/Users/your_username/Development/of-scraper-post/post-user.sh'
Jakan-Kink commented 1 month ago

just to make sure

getconf ARG_MAX
1048576
datawhores commented 1 month ago

Yeah I've never had to worry about this

datawhores commented 1 month ago

I have a partial solution but some information is still too long

Jakan-Kink commented 1 month ago

also, I hadn't gotten the post script to actually fire off earlier, just came back to this waiting:

 2024-08-09 01:27:34:[final_script.final_script:27]  Running post script
 2024-08-09 01:27:34:[level.inner:11]  Object of type Model is not JSON serializable
 2024-08-09 01:27:34:[level.inner:11]  Traceback (most recent call last):
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/run.py", line 88, in daemon_run_helper
    job_func()
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/context/exit.py", line 92, in inner
    raise E
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/context/exit.py", line 85, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/commands/managers/scraper.py", line 47, in runner
    final(normal_data , scrape_paid_data ,user_first_data,userdata)
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/runner/close/final/final.py", line 17, in final
    final_script(users or [])
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/runner/close/final/final_script.py", line 42, in final_script
    json.dumps(out_dict)
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Model is not JSON serializable
Jakan-Kink commented 1 month ago

Found what to blame: https://github.com/datawhores/OF-Scraper/blob/625be8e5a33a4b854d9fc1e494e0204a9e8cd180/ofscraper/runner/close/final/final_script.py#L29-L38

you create a variable data, and append the Model.model for each ele, but then pass users to the out_dict instead of data

datawhores commented 1 month ago

Yeah that only works for that one since the amount of data is small

for the download script my other solution won't work for larger creators

The user will have to read and process the data in there script

I think the only possibility is to redirect the data with > then the user would have to read the input_

Update: I think the solution is to write a single json to a temporary file, pass that path off to the script

datawhores commented 1 month ago

I will fix the post_script

for the post_download_script I made this change

        master_dump=json.dumps({"username":username,"model_id":model_id,"media":media,"posts":posts})
        with tempfile.NamedTemporaryFile() as f:
          with open(f.name, "w") as g:
              g.write(master_dump)
          run([settings.get_post_download_script(),f.name])

I think the post_script will be okay, but just to be safe and to put things in sync I think I will do the same for that as as well

datawhores commented 1 month ago

So far it been working on my system I've been testing with --post-script cat and --download-script cat to make sure the output is shown on the console

Tested in

Jakan-Kink commented 3 weeks ago

looks like in some of the work between 3.11.2 and 3.11.6 there seems to have been a change in final_script.py that caused a crash:

 2024-08-21 22:49:16:[final_script.final_script:31]  Running post script
 2024-08-21 22:49:16:[level.inner:11]  unhashable type: 'dict'
 2024-08-21 22:49:16:[level.inner:11]  Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/your_username/utils/run.py", line 88, in daemon_run_helper
    job_func()
  File "/venv/lib/python3.11/site-packages/your_username/utils/context/exit.py", line 92, in inner
    raise E
  File "/venv/lib/python3.11/site-packages/your_username/utils/context/exit.py", line 85, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/your_username/commands/managers/scraper.py", line 50, in runner
    final(normal_data, scrape_paid_data, user_first_data, userdata)
  File "/venv/lib/python3.11/site-packages/your_username/runner/close/final/final.py", line 20, in final
    final_script(userdata or [])
  File "/venv/lib/python3.11/site-packages/your_username/runner/close/final/final_script.py", line 36, in final_script
    data = value
    ~~~~^^^^^
TypeError: unhashable type: 'dict'
dunngitter commented 2 weeks ago

looks like in some of the work between 3.11.2 and 3.11.6 there seems to have been a change in final_script.py that caused a crash:

+1 on this, I'm seeing the same issue on 3.11.6

datawhores commented 2 weeks ago

should be fixed

dunngitter commented 2 weeks ago

In which release? 3.11.7? Could you please generate the package for that version if so? I can't pull the docker image right now

Jakan-Kink commented 2 weeks ago

Its not in a release, it is in commit ce82515


From: dunngitter @.> Sent: Saturday, August 24, 2024 14:13 To: datawhores/OF-Scraper @.> Cc: Jakan @.>; Author @.> Subject: Re: [datawhores/OF-Scraper] Post User Process not running consistently (Issue #444)

In which release? 3.11.7? Could you please generate the package for that version if so? I can't pull the docker image right now

— Reply to this email directly, view it on GitHubhttps://github.com/datawhores/OF-Scraper/issues/444#issuecomment-2308480571, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BKDQMJSDELPLMBBXIPYBGULZTDELJAVCNFSM6AAAAABMHMG6UOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYGQ4DANJXGE. You are receiving this because you authored the thread.Message ID: @.***>