Willy-JL / F95Checker

GNU General Public License v3.0
123 stars 19 forks source link

Fast Check Error / Too many requests #176

Open Whatthee opened 2 weeks ago

Whatthee commented 2 weeks ago

The last week or so, whenever I do a full recheck I get this error:

Traceback (most recent call last): File "D:\a\F95Checker\F95Checker\modules\api.py", line 512, in fast_check File "C:\hostedtoolcache\windows\Python\3.11.2\x64\Lib\json__init__.py", line 346, in loads File "C:\hostedtoolcache\windows\Python\3.11.2\x64\Lib\json\decoder.py", line 337, in decode File "C:\hostedtoolcache\windows\Python\3.11.2\x64\Lib\json\decoder.py", line 355, in raw_decode json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

With the check_broken.bin saying: "429 Too Many Requests" check_broken.zip

I can right click on single items and do a full recheck, but not for the entire database. Also I don't have any "D:\a" folder, so I wonder why the api should be there, but I'm not sure if that's related anyhow.

Whatthee commented 2 weeks ago

I see that it is due to f95zones restrictions https://f95zone.to/threads/f95checker-willyjl.44173/post-15281568

Perhaps one way to deal with it is to use the https://f95zone.to/sam/latest_alpha/ as a fast-check

Then full checks could be for batches of 100 games at a time

gimpyestrada commented 2 weeks ago

Would lowering the worker count help for this?

Whatthee commented 2 weeks ago

I've tried going all the way down to 1 worker. Does not help. Eventually I do not think that f95zone allows much more than 50-100 requests within a short period.

Whatthee commented 2 weeks ago

A quick fix could be something like this. I've done it with a https://f95zone.to/sam/latest_alpha.html file in a folder, but I'm sure you know a better way 😅 this will at least get

import os
import pandas as pd
from bs4 import BeautifulSoup

def transform_file_content(html_content):
    try:
        # Parse the HTML content
        soup = BeautifulSoup(html_content, 'html.parser')
        data = []

        # Extract data for each ".resource-tile"
        for tile in soup.select(".resource-tile"):
            try:
                name = tile.select_one(".resource-tile_info-header_title").get_text(strip=True) if tile.select_one(
                    ".resource-tile_info-header_title") else None
                link = tile.select_one(".resource-tile_link")['href'] if tile.select_one(
                    ".resource-tile_link") else None
                version = tile.select_one(".resource-tile_label-version").get_text(strip=True) if tile.select_one(
                    ".resource-tile_label-version") else None
                watching = bool(tile.select_one(".watch-icon"))
                developer = tile.select_one(".resource-tile_dev").get_text(strip=True) if tile.select_one(
                    ".resource-tile_dev") else None
                meta_time = tile.select_one(".resource-tile_info-meta_time").get_text(strip=True) if tile.select_one(
                    ".resource-tile_info-meta_time") else None
                meta_likes = tile.select_one(".resource-tile_info-meta_likes").get_text(strip=True) if tile.select_one(
                    ".resource-tile_info-meta_likes") else None
                meta_views = tile.select_one(".resource-tile_info-meta_views").get_text(strip=True) if tile.select_one(
                    ".resource-tile_info-meta_views") else None
                meta_rating = tile.select_one(".resource-tile_info-meta_rating").get_text(
                    strip=True) if tile.select_one(".resource-tile_info-meta_rating") else None

                data.append({
                    "Name": name,
                    "Link": link,
                    "Version": version,
                    "Watching": watching,
                    "Developer": developer,
                    "meta_time": meta_time,
                    "meta_likes": meta_likes,
                    "meta_views": meta_views,
                    "meta_rating": meta_rating
                })
            except Exception as e:
                print(f"Error extracting data from tile: {e}")

        df = pd.DataFrame(data)
        df['Watching'] = df['Watching'].fillna(False)

        # Handle type conversion with error checking
        for col, dtype in {
            "Name": "string",
            "Link": "string",
            "Version": "string",
            "Developer": "string",
            "meta_time": "float64",
            "meta_likes": "float64",
            "meta_views": "string",
            "meta_rating": "string"
        }.items():
            try:
                df[col] = pd.to_numeric(df[col], errors='coerce') if dtype == "float64" else df[col].astype(dtype)
            except Exception as e:
                print(f"Error converting column {col} to {dtype}: {e}")

        df['Index'] = range(len(df)) # this could even be used to align the sorting with the sorting on the site, in addition to the last_updated, since that seems to be more prone to errors (e.g. sometimes even 2025 due to typing mistakes)
        return df

    except Exception as e:
        print(f"Error in transform_file_content: {e}")
        return pd.DataFrame()  # Return empty DataFrame on failure

def process_folder(folder_path):
    all_data = pd.DataFrame()

    for filename in os.listdir(folder_path):
        if filename.endswith('.html'):
            file_path = os.path.join(folder_path, filename)
            html_content = None

            # Attempt to read the file with multiple encodings
            for encoding in ['utf-8', 'ISO-8859-1', 'Windows-1252']:
                try:
                    with open(file_path, 'r', encoding=encoding) as file:
                        html_content = file.read()
                    print(f"Successfully read {filename} with {encoding} encoding")
                    break  # Exit the loop if reading succeeds
                except UnicodeDecodeError:
                    print(f"Failed to read {filename} with {encoding} encoding, trying next...")
                except Exception as e:
                    print(f"Unexpected error with file {filename}: {e}")
                    break

            if html_content is None:
                print(f"Could not read {filename} with any encoding, skipping...")
                continue

            # Process the HTML content if it was successfully read
            try:
                df = transform_file_content(html_content)
                all_data = pd.concat([all_data, df], ignore_index=True)
            except Exception as e:
                print(f"Error processing file {filename}: {e}")

    return all_data

# Define folder path and call process_folder
folder_path = r"C:\htmlfiles" # it does not have to be from a folder, but I'm quite new to this type of coding
result_df = process_folder(folder_path)
print(result_df)
disaster2395 commented 2 weeks ago

Temporal fix until official one will appear or situation get fixed by itself: https://f95zone.to/threads/f95checker-willyjl.44173/post-15297719

Willy-JL commented 2 weeks ago

Perhaps one way to deal with it is to use the https://f95zone.to/sam/latest_alpha/ as a fast-check

the issue is not fast checks. fast checks use a dedicated api that sam made specifically for f95checker which checks 100 games at a time and returns the version string. what is causing a rate limit is the full checks to get all thread info. theres no other way to get all the information that this tool shows to you without requesting the full thread, and looks like they just added more rate limits to it.

gimpyestrada commented 2 weeks ago

Could it fail better? Right now it fails, but f95checker still thinks that it is waiting for something to complete, so the only way to continue is to close it, wait a while, then open it again.

gimpyestrada commented 1 week ago

Temporal fix until official one will appear or situation get fixed by itself: https://f95zone.to/threads/f95checker-willyjl.44173/post-15297719

This fails for me too.

disaster2395 commented 1 week ago

@gimpyestrada did you check "Retry on 429" checkbox under "Refresh" settings section?

gimpyestrada commented 1 week ago

@gimpyestrada did you check "Retry on 429" checkbox under "Refresh" settings section?

I did not catch that part. Let me try again.

gimpyestrada commented 1 week ago

Looks like it is working!

Willy-JL commented 5 days ago

please try latest beta version, should all be fixed and be much mroe efficient too

gimpyestrada commented 4 days ago

please try latest beta version, should all be fixed and be much mroe efficient too

Would I just drop it in the directory and overwrite the files from up above?

Willy-JL commented 4 days ago

yeah, in the program's installation directory there is no user data, that is all stored in appdata / corresponding location for other OS, so just dont touch that. personally i would delete the old version's install files before putting in the new version, simply overwriting might work but might get messy over time with things moving