KeyError: 'source' - Githubissues

myps6415 commented 3 weeks ago

Hi, I got the KeyError below. Is anyone know how to fix it? Thanks a lot.

poetry run python start_us.py
[2024-08-21 13:25:20] Assigning Jobs
Processing Scraped Posts
  0%|                                                                                             | 0/436 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/ubuntu/work/UltimaScraper/start_us.py", line 62, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/ubuntu/work/UltimaScraper/start_us.py", line 44, in main
    _api = await USR.start(
  File "/home/ubuntu/work/UltimaScraper/ultima_scraper/ultima_scraper.py", line 50, in start
    await self.start_datascraper(datascraper)
  File "/home/ubuntu/work/UltimaScraper/ultima_scraper/ultima_scraper.py", line 137, in start_datascraper
    await datascraper.datascraper.api.job_manager.process_jobs()
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_api/managers/job_manager/job_manager.py", line 45, in process_jobs
    await asyncio.create_task(self.__worker())
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_api/managers/job_manager/job_manager.py", line 53, in __worker
    await job.task
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/modules/module_streamliner.py", line 202, in prepare_scraper
    await self.process_scraped_content(
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/modules/module_streamliner.py", line 237, in process_scraped_content
    unrefined_set: list[dict[str, Any]] = await tqdm_asyncio.gather(
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/tqdm/asyncio.py", line 79, in gather
    res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/tqdm/asyncio.py", line 79, in <listcomp>
    res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
  File "/usr/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/tqdm/asyncio.py", line 76, in wrap_awaitable
    return i, await f
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/managers/datascraper_manager/datascrapers/onlyfans.py", line 51, in media_scraper
    content_metadata.resolve_extractor(Extractor(post_result))
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/managers/metadata_manager/metadata_manager.py", line 216, in resolve_extractor
    self.medias: list[MediaMetadata] = result.get_medias(self)
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/managers/metadata_manager/metadata_manager.py", line 147, in get_medias
    main_url = self.item.url_picker(asset_metadata)
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py", line 39, in url_picker
    source = media_item["source"]
KeyError: 'source'

Neurosis404 commented 3 weeks ago

Happens to me too. Worked 3-4 days ago, issue appeared suddenly without any visible reason. Also not model related, tried another one and the error appears there too.

betoalanis commented 3 weeks ago

+1

I pick "scrape all" and still, so I can confirm it has nothing to do with a model in specific, I think source just means OF in general.

gri1n commented 3 weeks ago

+1

Is this project event maintained anymore?

felixtheant commented 2 weeks ago

Haven't used this in awhile and when I do, I get the same error.

barthramsay commented 2 weeks ago

Some investigation about general updates (because this codebase is old):

When looking at recent pypi package dependencies where the error happens with version 1.1.4 of ultima-scraper-api

and especially UltimaScraper ITSELF on pypi it seems that the latest UltimaScraper on pypi is newer than what is available in github.

I will investigate further but probably upgrading ultimascraper with last pypi sources will maybe or most likely fix this issue?

The codebase here is outdated with dependencies 2 years old but the pypi one using recent versions from this year from first view.

Interesting links with regular updated codebase (but not UltimaScraper itself somehow):

barthramsay commented 2 weeks ago

@DIGITALCRIMINAL you mind either updating this repo or providing us a new updated start_us.py ?

Thank you

UrsaBear commented 1 week ago

It looks like the data structure that OnlyFans is using has changed. They removed the source key from the media, which was causing issues with getting the URLs. Now the source url is in files.full.url. I made some tweaks to the url_picker method in ultima_scraper_api/apis/onlyfans/__init__.py, now it works. Here’s the quick fix I did for the url_picker method:

    def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
        authed = self.get_author().get_authed()
        video_quality = (
            video_quality or self.author.get_api().get_site_settings().video_quality
        )
        if not media_item["canView"]:
            return
        source: dict[str, Any] = {}
        media_type: str = ""
        if "files" in media_item:
            media_type = media_item["type"]
            media_item = media_item["files"]
            source = media_item["full"]
        else:
            return
        url = source.get("url")
        return urlparse(url) if url else None

betoalanis commented 1 week ago

It looks like the data structure that OnlyFans is using has changed. They removed the source key from the media, which was causing issues with getting the URLs. Now the source url is in files.full.url. I made some tweaks to the url_picker method in ultima_scraper_api/apis/onlyfans/__init__.py, now it works. Here’s the quick fix I did for the url_picker method:
    def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
        authed = self.get_author().get_authed()
        video_quality = (
            video_quality or self.author.get_api().get_site_settings().video_quality
        )
        if not media_item["canView"]:
            return
        source: dict[str, Any] = {}
        media_type: str = ""
        if "files" in media_item:
            media_type = media_item["type"]
            media_item = media_item["files"]
            source = media_item["full"]
        else:
            return
        url = source.get("url")
        return urlparse(url) if url else None

I can confirm this is working, TYVM!!

UPDATE: I scrapped an account perfecly, and after that I'm getting a TypeError: argument of type 'NoneType' is not iterable error, so it's failing after one scrapped model after selecting "All", seems to be working correctly when selecting models 1 by 1

ANOTHER UPDATE: the script now seems to be working properly when selecting ALL, maybe some of my models db are corrupted, still testing, but overall this edit works :D

betoalanis commented 1 week ago

Ok, after some testing, I noticed the error comes from the change from OF on the preview url's and I cross checked (https://github.com/UltimaHoarder/UltimaScraper/issues/2121#issuecomment-2318619581)

in the same __init__.py file I replaced all the ["preview"] in preview_url_picker for ["full"]

cigix commented 1 week ago

That got my downloads repaired as well, thanks everyone!

raphaelbarreto commented 1 week ago

I've tried to replicate the steps but cant make it work. Can anyone upload somewhere a working code version, please?

myps6415 commented 2 days ago

Ok, after some testing, I noticed the error comes from the change from OF on the preview url's and I cross checked (#2121 (comment))

in the same __init__.py file I replaced all the ["preview"] in preview_url_picker for ["full"]

Hi everyone, I think this problem has been solved by everyone and it is worked for me now. I will make a summary here.

You need to fix __init__.py in folder ultima_scraper_api/apis/onlyfans. I think it's not easily to find out because you are in UltimaScraper this project. So, here I write down the full path: UltimaScraper/.venv/lib/python3.11/site-packages/ultima_scraper_api/apis/onlyfans, fix __init__.py here.

The corrected __init__.py is as follows:

from __future__ import annotations

from typing import TYPE_CHECKING, Any, Literal
from urllib.parse import urlparse

SubscriptionType = Literal["all", "active", "expired", "attention"]

if TYPE_CHECKING:
    from ultima_scraper_api.apis.onlyfans.classes.user_model import (
        AuthModel,
        create_user,
    )

class SiteContent:
    def __init__(self, option: dict[str, Any], user: AuthModel | create_user) -> None:
        self.id: int = option["id"]
        self.author = user
        self.media: list[dict[str, Any]] = option.get("media", [])
        self.preview_ids: list[int] = []
        self.__raw__ = option

    def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
        authed = self.get_author().get_authed()
        video_quality = (
            video_quality or self.author.get_api().get_site_settings().video_quality
        )
        if not media_item["canView"]:
            return
        source: dict[str, Any] = {}
        media_type: str = ""
        if "files" in media_item:
            media_type = media_item["type"]
            media_item = media_item["files"]
            source = media_item["full"]
        else:
            return
        url = source.get("url")
        return urlparse(url) if url else None

    def preview_url_picker(self, media_item: dict[str, Any]):
        preview_url = None
        if "files" in media_item:
            if (
                "preview" in media_item["files"]
                and "url" in media_item["files"]["full"]
            ):
                preview_url = media_item["files"]["full"]["url"]
        else:
            preview_url = media_item["full"]
            return urlparse(preview_url) if preview_url else None

    def get_author(self):
        return self.author

    async def refresh(self):
        func = await self.author.scrape_manager.handle_refresh(self)
        return await func(self.id)

Another thing is if you run this project by docker before, you need to rebuild your image and remember to put the fixed __init__.py in to right place. So I put my Dockerfile bellow:

FROM python:3.10-slim
RUN apt-get update && apt-get install -y \
  curl \
  libpq-dev \
  gcc \
  && rm -rf /var/lib/apt/lists/*
WORKDIR /usr/src/app
ENV POETRY_HOME=/usr/local/share/pypoetry
ENV POETRY_VIRTUALENVS_CREATE=false
RUN curl -sSL https://install.python-poetry.org | python3 -

COPY . .

RUN /usr/local/share/pypoetry/bin/poetry install --only main

COPY .venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py /usr/src/app/.venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py

CMD [ "/usr/local/share/pypoetry/bin/poetry", "run", "python", "./start_us.py" ]

After those settings, I think you can run it well. In my experience, after all settings, "KeyError: 'data'" appeared because new cookie needs to setting. You need to reset auth.json in __user_data__/profiles/OnlyFans/default/auth.json.

UltimaHoarder / UltimaScraper

KeyError: 'source' #2120