Open kxk302 opened 3 years ago
@anuprulez I just ran my third workflow (CNN workflow) on galaxy.eu and it failed. Could you please check the log to see what error message we get? Thanks.
I only see "Failed to communicate with remote job server."
@kxk302 in the first and third histories, I don't have permission to see those datasets. Can you unlock those?
Update: I re-ran the third history after the initial failure and it completed successfully.
@anuprulez how do I unlock the datasets? I don't see an option when trying to share history. If you want we can use Gitter to resolve this. Thx
I see some changes have been made to: https://github.com/goeckslab/Galaxy-ML/tree/master/galaxy_ml very recently
https://github.com/goeckslab/Galaxy-ML/tree/master/galaxy_ml
Yes, there was a bug fix in Galaxy-ML that was pushed recently.
Here are the links to all workflows and datasets for histories:
First history:
https://training.galaxyproject.org/training-material/topics/statistics/tutorials/FNN/workflows/ https://zenodo.org/record/4660497/files/X_test.tsv https://zenodo.org/record/4660497/files/X_train.tsv https://zenodo.org/record/4660497/files/y_test.tsv https://zenodo.org/record/4660497/files/y_train.tsv
Second history:
https://training.galaxyproject.org/training-material/topics/statistics/tutorials/RNN/workflows/ https://zenodo.org/record/4477881/files/X_test.tsv https://zenodo.org/record/4477881/files/X_train.tsv https://zenodo.org/record/4477881/files/y_test.tsv https://zenodo.org/record/4477881/files/y_train.tsv
Third history:
https://training.galaxyproject.org/training-material/topics/statistics/tutorials/CNN/workflows/ https://zenodo.org/record/4697906/files/X_train.tsv https://zenodo.org/record/4697906/files/y_train.tsv https://zenodo.org/record/4697906/files/X_test.tsv https://zenodo.org/record/4697906/files/y_test.tsv
You need to re-name the uploaded files and change their type to tabular, before running the workflows. Thx.
Second history:
https://training.galaxyproject.org/training-material/topics/statistics/tutorials/RNN/workflows/ https://zenodo.org/record/4477881/files/X_test.tsv https://zenodo.org/record/4477881/files/X_train.tsv https://zenodo.org/record/4477881/files/y_test.tsv https://zenodo.org/record/4477881/files/y_train.tsv
I get these errors while running this workflow
@anuprulez did you downgrade the tool versions in the RNN workflow?
No, I just ran it
On Thu, May 13, 2021, 8:15 PM kxk302 @.***> wrote:
@anuprulez https://github.com/anuprulez did you downgrade the tool versions in the RNN workflow?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bgruening/galaxytools/issues/1115#issuecomment-840610156, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXB5NQQ7XM3ZTQDEWAKSYLTNPQYHANCNFSM442SE2IA .
If you downgrade the tool versions as I documented, it will work.
I guess the questions is why it stopped working with the new versions of those tools.
Try to check various package versions in the conda environment, python version as well (make sure python 3.6). The conda includes a lot of members, prone to make errors when a newer package joins the team.
Thanks @qiagu,
Could you please provide more info on how to do that?
Sorry, I just say a general debugging process, not specific to any issue mentioned in this thread. From the stderr report @anuprulez provided, I feel the errors could be cleared by re-cleaning the input TSVs.
Try to ensure the classification targets are integers, not float.
I do not see the errors that Anup sees. I guess the first step would be to get these workflows working with older versions of the tools. Then we can use the new version to re-produce the problem. @anuprulez not sure what your internet connectivity is like, but we could possibly have a Zoom meeting to discuss tomorrow (Friday). I'm free from 8:00 am to 10:100 am EST time.
I only see "Failed to communicate with remote job server."
That's a job running error, you'll want to check this with Nate, that is not a tool error.
I only see "Failed to communicate with remote job server."
That's a job running error, you'll want to check this with Nate, that is not a tool error.
This is run on EU. I remember vaguely Bjorn saying that some jobs are configured to run on GPU and this error would show up then, and the error would go away when job was run on CPU. Am I right @bgruening?
I have 3 workflows that use Galaxy's ML tools (namely Keras for neural networks). They all worked fine last time I ran them (maybe a month ago?).
These 3 workflows are used in 3 neural network tutorials that I am presenting at GCC 2021. I decided to re-run them to make sure all is good. All 3 workflows fail now. Here is the error message for the first 2 workflows:
Here are the histories:
Per @anuprulez' suggestion, I downgraded the tool versions and the first and second workflow work now. Below is the downgrade:
The third workflow still fails. BTW, it requires the most recent version of the third tool.
I started writing unit tests in galaxytools (https://github.com/kxk302/galaxytools/tree/nn_tests), so these workflows are run as part of the unit test. They would serve as regression tests and would guarantee future changes would not break old code. However, I ran into another issue: models saved to file cannot be loaded and error out. Not sure if this is related to the workflow error above. Here is the error message: