Open rcannood opened 1 month ago
Hi @rcannood I only filtered the uncommon data in df2. Every other dataframe obtained using all features.
Thanks for taking a look at this!
Unfortunately, even with the updated parameter settings, we could not reproduce the similar performance levels previously achieved by this method on the Kaggle Leaderboard.
Would you be able to take a look at this script to see if you can spot the issue?
You should be able to run it with the following commands:
aws s3 sync --no-sign-request \
"s3://openproblems-bio/public/neurips-2023-competition/workflow-resources/" \
"resources"
python src/task/methods/transformer_ensemble/script.py
(Provided that you have all of the dependencies installed.)
Hi Based on your script you use only 10 epochs? Is it true? You should run it for 10k epochs, the architecture is a bit complex.
בתאריך יום ג׳, 4 ביוני 2024 ב-14:26 מאת Robrecht Cannoodt < @.***>:
Thanks for taking a look at this!
Unfortunately, even with the updated parameter settings, we could not reproduce the similar performance levels previously achieved by this method on the Kaggle Leaderboard.
Would you be able to take a look at this script https://github.com/openproblems-bio/task-dge-perturbation-prediction/blob/main/src/task/methods/transformer_ensemble/script.py to see if you can spot the issue?
You should be able to run it with the following commands:
aws s3 sync --no-sign-request \ "s3://openproblems-bio/public/neurips-2023-competition/workflow-resources/" \ "resources"
python src/task/methods/transformer_ensemble/script.py
(Provided that you have all of the dependencies installed.)
— Reply to this email directly, view it on GitHub https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2147292386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJILFJW7EDJGA2KYBY2N2K3ZFWP5RAVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXGI4TEMZYGY . You are receiving this because you were mentioned.Message ID: @.***>
No, those are the parameters I use for testing. Viash removes the par
dictionary between the Viash start / Viash end section and replaces it with
the argument settings in the Viash config (config.vsh.yaml).
So effectively, the actual arguments being used are:
par = {
"de_train_h5ad": "resources/neurips-2023-kaggle/de_train.h5ad",
"id_map": "resources/neurips-2023-kaggle/id_map.csv",
"output": "output/prediction.h5ad",
"output_model": "output/model/",
"num_train_epochs": 20000,
"early_stopping": 5000,
"batch_size": 32,
"d_model": 128,
"layer": "sign_log10_pval"
}
Ok got it, The validation percentage is 0.2 as default in the train_non_k_means_strategy function. It should be 0.1 as mentioned in the kaggle solution post. The 0.1 percentage was the optimal for random train test split (3 of 4 dataframes using it). I hope it will enhance the performance, Let me know if you have any questions. Note: Better approach would be to create a set of k models (based on k folds) and return the average prediction but i didn't have time to implement it.
בתאריך יום ג׳, 4 ביוני 2024 ב-14:46 מאת Robrecht Cannoodt < @.***>:
No, those are the parameters I use for testing. Viash removes the
par
dictionary between the Viash start / Viash end section and replaces it with the argument settings in the Viash config.On Tue, 4 Jun 2024, 13:31 Elior, @.***> wrote:
Hi Based on your script you use only 10 epochs? Is it true? You should run it for 10k epochs, the architecture is a bit complex.
בתאריך יום ג׳, 4 ביוני 2024 ב-14:26 מאת Robrecht Cannoodt < @.***>:
Thanks for taking a look at this!
Unfortunately, even with the updated parameter settings, we could not reproduce the similar performance levels previously achieved by this method on the Kaggle Leaderboard.
Would you be able to take a look at this script <
to see if you can spot the issue?
You should be able to run it with the following commands:
aws s3 sync --no-sign-request \
"s3://openproblems-bio/public/neurips-2023-competition/workflow-resources/" \
"resources"
python src/task/methods/transformer_ensemble/script.py
(Provided that you have all of the dependencies installed.)
— Reply to this email directly, view it on GitHub <
https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2147292386>,
or unsubscribe <
. You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHub < https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2147301563>,
or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAEHFKWYOKI66NRDNIMHP4LZFWQRLAVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXGMYDCNJWGM>
. You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2147328143, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJILFJT3QSIA44B2FSCSACTZFWSJ5AVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXGMZDQMJUGM . You are receiving this because you were mentioned.Message ID: @.***>
Thanks for your input, Elior!
Do you mean something like this? → https://github.com/openproblems-bio/task-dge-perturbation-prediction/pull/65/files
Note: Better approach would be to create a set of k models (based on k folds) and return the average prediction but i didn't have time to implement it.
Regarding this: first and foremost, we'd like to be able to recreate the source code used to generate the submission that ended up winning in the Kaggle competition, not add new features ;)
Yes exactly. Using a 0.2 as a validation percentage might be leading to missed drugs in our data. The model struggles with these missing values, impacting its overall performance. The 0.1 value was the optimal one, the 0.2 was hard coded, my bad :)
בתאריך יום ג׳, 4 ביוני 2024 ב-18:59 מאת Robrecht Cannoodt < @.***>:
Thanks for your input, Elior!
Do you mean something like this? → https://github.com/openproblems-bio/task-dge-perturbation-prediction/pull/65/files
Note: Better approach would be to create a set of k models (based on k folds) and return the average prediction but i didn't have time to implement it.
Regarding this: first and foremost, we'd like to be able to recreate the source code used to generate the submission that ended up winning in the Kaggle competition, not add new features ;)
— Reply to this email directly, view it on GitHub https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2147890253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJILFJU7RW5BTCTMW7ZUZWTZFXP65AVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXHA4TAMRVGM . You are receiving this because you were mentioned.Message ID: @.***>
I just ran the method with a validation percentage of 0.1 instead of 0.2, and the resulting MRRMSE score was worse.
Would you be able to run through the code and verify which parts need to be changed in order for the code to produce a decent result? :bow:
I’m travelling outside of town and I’ll be back in June 28th. I’ll review the code again but I can’t run it unfortunately :( Can you please verify that the original dataframes yields the expected output?
בתאריך יום ג׳, 4 ביוני 2024 ב-22:59 מאת Robrecht Cannoodt < @.***>:
I just ran the method with a validation percentage of 0.1 instead of 0.2, and the resulting MRRMSE score was worse.
Would you be able to run through the code and verify which parts need to be changed in order for the code to produce a decent result? 🙇
— Reply to this email directly, view it on GitHub https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2148318636, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJILFJQJVHTYUO24D42AVJDZFYMCRAVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYGMYTQNRTGY . You are receiving this because you were mentioned.Message ID: @.***>
Another question: Does the mrmmse getting worse in the validation set or in the test results?
בתאריך יום ג׳, 4 ביוני 2024 ב-23:17 מאת Elior Kalfon @.***>:
I’m travelling outside of town and I’ll be back in June 28th. I’ll review the code again but I can’t run it unfortunately :( Can you please verify that the original dataframes yields the expected output?
בתאריך יום ג׳, 4 ביוני 2024 ב-22:59 מאת Robrecht Cannoodt < @.***>:
I just ran the method with a validation percentage of 0.1 instead of 0.2, and the resulting MRRMSE score was worse.
Would you be able to run through the code and verify which parts need to be changed in order for the code to produce a decent result? 🙇
— Reply to this email directly, view it on GitHub https://github.com/Eliorkalfon/single_cell_pb/issues/11#issuecomment-2148318636, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJILFJQJVHTYUO24D42AVJDZFYMCRAVCNFSM6AAAAABIVI4OWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYGMYTQNRTGY . You are receiving this because you were mentioned.Message ID: @.***>
Hey @Eliorkalfon !
I'm trying to elucidate why this method is not performing as well as it should be once we rerun the benchmarking analyses. There is probably something wrong in the code either in this repo, or in the reinterpretation of the method in task-dge-perturbation-prediction.
In this repo, we had to modify the code a bit to generate the four separate submissions and compute the weighted average in a simple way.
However, the different parameters on how the different data frames were generated is not crystal clear.
The kaggle post reads:
From this I elucidate:
From the description in the kaggle notebook it isn't clear to me whether "uncommon" should be set to True or False for df3 and df4. In addition, I wonder whether other arguments should also be added, such as any of the layer dimensions.
@Eliorkalfon Would you be able to give some insights into this?