Kohulan / DECIMER-Image_Transformer

DECIMER: Deep Learning for Chemical Image Recognition using Efficient-Net V2 + Transformer
MIT License
197 stars 51 forks source link

Wrong st_size of DECIMER_HandDrawn_model #94

Closed alexey-krasnov closed 6 months ago

alexey-krasnov commented 6 months ago

Issue Type

Bug

Source

GitHub (source)

DECIMER Image Transformer Version

2.6.0

OS Platform and Distribution

MacOS

Python version

3.11

Current Behaviour?

Hi all, It seems that st_size of DECIMER_HandDrawn_model is incorrect, leading to constant downloading of this model even if it was first downloaded and unzipped correctly.

The actual st_size as is:

import os
os.stat(".data/DECIMER-V2/DECIMER_HandDrawn_model/saved_model.pb").st_size

>>28080328

The problem inside this function in module utils.py:

def ensure_models(default_path: str, model_urls: dict) -> dict:
    """Function to ensure models are present locally.

    Convenient function to ensure model downloads before usage

    Args:
        default_path (str): Default path for model data
        model_urls (dict): Dictionary containing model names as keys and their corresponding URLs as values

    Returns:
        dict: A dictionary containing model names as keys and their local paths as values
    """
    model_paths = {}

    for model_name, model_url in model_urls.items():
        model_path = os.path.join(default_path, f"{model_name}_model")
        if (
            os.path.exists(model_path)
            and os.stat(os.path.join(model_path, "saved_model.pb")).st_size != 28080309
        ):
            shutil.rmtree(model_path)
            config.download_trained_weights(model_url, default_path)
        elif not os.path.exists(model_path):
            config.download_trained_weights(model_url, default_path)

        # Store the model path
        model_paths[model_name] = model_path

    return model_paths

Which images caused the issue? (This is mandatory for images related issues)

No response

Standalone code to reproduce the issue

from DECIMER import predict_SMILES

Relevant log output

Downloading trained model to /.data/DECIMER-V2
Downloading DECIMER_HandDrawn_model.zip:  32%|█████████████████████████████████████████▊                                                                                        | 91.5M/285M [00:18<00:44, 4.50MB/s]

Code of Conduct

alexey-krasnov commented 6 months ago

The problem was solved by implementing a dictionary model_sizes :

model_sizes = {
        "DECIMER": 28080309,
        "DECIMER_HandDrawn": 28080328, }

in function ensure_models to store st_size of each model for further comparing of the sizes as follows:

def ensure_models(default_path: str, model_urls: dict) -> dict:
    """Function to ensure models are present locally.

    Convenient function to ensure model downloads before usage

    Args:
        default_path (str): Default path for model data
        model_urls (dict): Dictionary containing model names as keys and their corresponding URLs as values

    Returns:
        dict: A dictionary containing model names as keys and their local paths as values
    """
    model_paths = {}
    # Store st_size of each model
    model_sizes = {
        "DECIMER": 28080309,
        "DECIMER_HandDrawn": 28080328,
    }
    for model_name, model_url in model_urls.items():
        model_path = os.path.join(default_path, f"{model_name}_model")
        if (
            os.path.exists(model_path)
            and os.stat(os.path.join(model_path, "saved_model.pb")).st_size != model_sizes.get(model_name)
        ):
            print(f'Working with model {model_name}')
            shutil.rmtree(model_path)
            config.download_trained_weights(model_url, default_path)
        elif not os.path.exists(model_path):
            config.download_trained_weights(model_url, default_path)

        # Store the model path
        model_paths[model_name] = model_path
    return model_paths
Kohulan commented 6 months ago

Hi @alexey-krasnov ,

Thanks a lot for reporting the issue and providing a working solution this is great. If you wish to submit a pull request with your implementation, I would be delighted to review it. However, please ensure that it adheres to our testing and formatting guidelines before submission.

Best regards, Kohulan