What are the parameter sets that were used to generate the different dfs?

rcannood commented 1 month ago

Hey @Eliorkalfon !

I'm trying to elucidate why this method is not performing as well as it should be once we rerun the benchmarking analyses. There is probably something wrong in the code either in this repo, or in the reinterpretation of the method in task-dge-perturbation-prediction.

In this repo, we had to modify the code a bit to generate the four separate submissions and compute the weighted average in a simple way.

However, the different parameters on how the different data frames were generated is not crystal clear.

The kaggle post reads:

weight_df1: 0.5 (utilizing std, mean, and clustering sampling, yielding 0.551)

weight_df2: 0.25 (excluding uncommon elements, resulting in 0.559)

weight_df3: 0.25 (leveraging clustering sampling, achieving 0.575)

weight_df4: 0.3 (incorporating mean, random sampling, and excluding std, attaining 0.554)

From this I elucidate:

argsets = [
    # Note by author - weight_df1: 0.5 (utilizing std, mean, and clustering sampling, yielding 0.551)
    {
        "name": "df1",
        "mean_std": "mean_std",
        "uncommon": False,
        "sampling_strategy": "random",
        "weight": 0.5,
    },
    # Note by author - weight_df2: 0.25 (excluding uncommon elements, resulting in 0.559)
    {
        "name": "df2",
        "mean_std": "mean_std",
        "uncommon": True,
        "sampling_strategy": "random",
        "weight": 0.25,
    },
    # Note by author - weight_df3: 0.25 (leveraging clustering sampling, achieving 0.575)
    {
        "name": "df3",
        "mean_std": "mean_std",
        "uncommon": False, # should this be set to False or True?
        "sampling_strategy": "k-means",
        "weight": 0.25,
    },
    # Note by author - weight_df4: 0.3 (incorporating mean, random sampling, and excluding std, attaining 0.554)
    {
        "name": "df4",
        "mean_std": "mean",
        "uncommon": False, # should this be set to False or True?
        "sampling_strategy": "random",
        "weight": 0.3,
    }
]

From the description in the kaggle notebook it isn't clear to me whether "uncommon" should be set to True or False for df3 and df4. In addition, I wonder whether other arguments should also be added, such as any of the layer dimensions.

@Eliorkalfon Would you be able to give some insights into this?

Eliorkalfon commented 1 month ago

Hi @rcannood I only filtered the uncommon data in df2. Every other dataframe obtained using all features.

rcannood commented 1 month ago

Thanks for taking a look at this!

Unfortunately, even with the updated parameter settings, we could not reproduce the similar performance levels previously achieved by this method on the Kaggle Leaderboard.

Would you be able to take a look at this script to see if you can spot the issue?

You should be able to run it with the following commands:

aws s3 sync --no-sign-request \
  "s3://openproblems-bio/public/neurips-2023-competition/workflow-resources/" \
  "resources"

python src/task/methods/transformer_ensemble/script.py

(Provided that you have all of the dependencies installed.)

Eliorkalfon commented 1 month ago

Hi Based on your script you use only 10 epochs? Is it true? You should run it for 10k epochs, the architecture is a bit complex.

בתאריך יום ג׳, 4 ביוני 2024 ב-14:26 מאת Robrecht Cannoodt < @.***>:

Thanks for taking a look at this!

Unfortunately, even with the updated parameter settings, we could not reproduce the similar performance levels previously achieved by this method on the Kaggle Leaderboard.

Would you be able to take a look at this script https://github.com/openproblems-bio/task-dge-perturbation-prediction/blob/main/src/task/methods/transformer_ensemble/script.py to see if you can spot the issue?

You should be able to run it with the following commands:

aws s3 sync --no-sign-request \ "s3://openproblems-bio/public/neurips-2023-competition/workflow-resources/" \ "resources"

python src/task/methods/transformer_ensemble/script.py

(Provided that you have all of the dependencies installed.)

— Reply to this email directly, view it on GitHub https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2147292386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJILFJW7EDJGA2KYBY2N2K3ZFWP5RAVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXGI4TEMZYGY . You are receiving this because you were mentioned.Message ID: @.***>

rcannood commented 1 month ago

No, those are the parameters I use for testing. Viash removes the par dictionary between the Viash start / Viash end section and replaces it with the argument settings in the Viash config (config.vsh.yaml).

So effectively, the actual arguments being used are:

par = {
    "de_train_h5ad": "resources/neurips-2023-kaggle/de_train.h5ad",
    "id_map": "resources/neurips-2023-kaggle/id_map.csv",
    "output": "output/prediction.h5ad",
    "output_model": "output/model/",
    "num_train_epochs": 20000,
    "early_stopping": 5000,
    "batch_size": 32,
    "d_model": 128,
    "layer": "sign_log10_pval"
}

Eliorkalfon commented 1 month ago

Ok got it, The validation percentage is 0.2 as default in the train_non_k_means_strategy function. It should be 0.1 as mentioned in the kaggle solution post. The 0.1 percentage was the optimal for random train test split (3 of 4 dataframes using it). I hope it will enhance the performance, Let me know if you have any questions. Note: Better approach would be to create a set of k models (based on k folds) and return the average prediction but i didn't have time to implement it.

‫בתאריך יום ג׳, 4 ביוני 2024 ב-14:46 מאת ‪Robrecht Cannoodt‬‏ <‪ @.***‬‏>:‬

No, those are the parameters I use for testing. Viash removes the par dictionary between the Viash start / Viash end section and replaces it with the argument settings in the Viash config.

On Tue, 4 Jun 2024, 13:31 Elior, @.***> wrote:

Hi Based on your script you use only 10 epochs? Is it true? You should run it for 10k epochs, the architecture is a bit complex.

בתאריך יום ג׳, 4 ביוני 2024 ב-14:26 מאת Robrecht Cannoodt < @.***>:

Thanks for taking a look at this!

Unfortunately, even with the updated parameter settings, we could not reproduce the similar performance levels previously achieved by this method on the Kaggle Leaderboard.

Would you be able to take a look at this script <

https://github.com/openproblems-bio/task-dge-perturbation-prediction/blob/main/src/task/methods/transformer_ensemble/script.py>

to see if you can spot the issue?

You should be able to run it with the following commands:

aws s3 sync --no-sign-request \

"s3://openproblems-bio/public/neurips-2023-competition/workflow-resources/" \

"resources"

python src/task/methods/transformer_ensemble/script.py

(Provided that you have all of the dependencies installed.)

— Reply to this email directly, view it on GitHub <

https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2147292386>,

or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJILFJW7EDJGA2KYBY2N2K3ZFWP5RAVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXGI4TEMZYGY>

. You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub < https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2147301563>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAEHFKWYOKI66NRDNIMHP4LZFWQRLAVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXGMYDCNJWGM>

. You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2147328143, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJILFJT3QSIA44B2FSCSACTZFWSJ5AVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXGMZDQMJUGM . You are receiving this because you were mentioned.Message ID: @.***>

rcannood commented 1 month ago

Thanks for your input, Elior!

Do you mean something like this? → https://github.com/openproblems-bio/task-dge-perturbation-prediction/pull/65/files

Note: Better approach would be to create a set of k models (based on k folds) and return the average prediction but i didn't have time to implement it.

Regarding this: first and foremost, we'd like to be able to recreate the source code used to generate the submission that ended up winning in the Kaggle competition, not add new features ;)

Eliorkalfon commented 1 month ago

Yes exactly. Using a 0.2 as a validation percentage might be leading to missed drugs in our data. The model struggles with these missing values, impacting its overall performance. The 0.1 value was the optimal one, the 0.2 was hard coded, my bad :)

‫בתאריך יום ג׳, 4 ביוני 2024 ב-18:59 מאת ‪Robrecht Cannoodt‬‏ <‪ @.***‬‏>:‬

Thanks for your input, Elior!

Do you mean something like this? → https://github.com/openproblems-bio/task-dge-perturbation-prediction/pull/65/files

Note: Better approach would be to create a set of k models (based on k folds) and return the average prediction but i didn't have time to implement it.

Regarding this: first and foremost, we'd like to be able to recreate the source code used to generate the submission that ended up winning in the Kaggle competition, not add new features ;)

— Reply to this email directly, view it on GitHub https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2147890253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJILFJU7RW5BTCTMW7ZUZWTZFXP65AVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXHA4TAMRVGM . You are receiving this because you were mentioned.Message ID: @.***>

rcannood commented 1 month ago

I just ran the method with a validation percentage of 0.1 instead of 0.2, and the resulting MRRMSE score was worse.

Would you be able to run through the code and verify which parts need to be changed in order for the code to produce a decent result? :bow:

Eliorkalfon commented 1 month ago

I’m travelling outside of town and I’ll be back in June 28th. I’ll review the code again but I can’t run it unfortunately :( Can you please verify that the original dataframes yields the expected output?

בתאריך יום ג׳, 4 ביוני 2024 ב-22:59 מאת Robrecht Cannoodt < @.***>:

I just ran the method with a validation percentage of 0.1 instead of 0.2, and the resulting MRRMSE score was worse.

Would you be able to run through the code and verify which parts need to be changed in order for the code to produce a decent result? 🙇

— Reply to this email directly, view it on GitHub https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2148318636, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJILFJQJVHTYUO24D42AVJDZFYMCRAVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYGMYTQNRTGY . You are receiving this because you were mentioned.Message ID: @.***>

Eliorkalfon commented 1 month ago

Another question: Does the mrmmse getting worse in the validation set or in the test results?

בתאריך יום ג׳, 4 ביוני 2024 ב-23:17 מאת Elior Kalfon @.***>:

I’m travelling outside of town and I’ll be back in June 28th. I’ll review the code again but I can’t run it unfortunately :( Can you please verify that the original dataframes yields the expected output?

בתאריך יום ג׳, 4 ביוני 2024 ב-22:59 מאת Robrecht Cannoodt < @.***>:

I just ran the method with a validation percentage of 0.1 instead of 0.2, and the resulting MRRMSE score was worse.

Would you be able to run through the code and verify which parts need to be changed in order for the code to produce a decent result? 🙇

— Reply to this email directly, view it on GitHub https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2148318636, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJILFJQJVHTYUO24D42AVJDZFYMCRAVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYGMYTQNRTGY . You are receiving this because you were mentioned.Message ID: @.***>

Eliorkalfon / single_cell_pb

What are the parameter sets that were used to generate the different dfs? #11