argilla-io / argilla

Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
https://docs.argilla.io/en/latest/
3.63k stars 339 forks source link

[BUG-python/deployment] #5133

Open HPeterr opened 2 days ago

HPeterr commented 2 days ago

Describe the bug

When pushing into argilla over 100 FeedbackDataset questions (each record contains 1 image), I get the following error:

ReadTimeout: timed out
argilla     | 12:58:53 argilla.1 |                              perform_request                                    
argilla     | 12:58:53 argilla.1 |                                  raise err from None                            
argilla     | 12:58:53 argilla.1 |                              elastic_transport.Connectio                        
argilla     | 12:58:53 argilla.1 |                              nTimeout: Connection                               
argilla     | 12:58:53 argilla.1 |                              timeout caused by:                                 
argilla     | 12:58:53 argilla.1 |                              TimeoutError()

Stacktrace and Code to create the bug

    records = []

    for _, row in data.iterrows():
        input_path = os.path.join(
            base_path, row["File Name"], "images", row["Image Name"]
        )
        output_path = os.path.join(config["tmp_path"], "compressed_image.jpg")
        compress_image(input_path, output_path)

        classification = (
            "1" if row["crack_diff_settlement"] else "0"
        )
        description = row["description"]
        explanation = row["explanation"]
        infos = f"**Classification:** {classification}\n\n**Description:** {description}\n\n**Explanation:** {explanation}"

        record = rg.FeedbackRecord(
            fields={
                "image": f"{row['Image Name']}\n\n{image_to_html(output_path)}",
                "informations": infos,
            },
        )

        records.append(record)

    ds_multi_modal.add_records(records)
    ds_multi_modal.push_to_argilla(config["dataset_name"])

Expected behavior

I was expecting to be able to push as many records with images as I want. I changed the heap size of elasticsearch ES_JAVA_OPTS: -Xms10g -Xmx10g