Closed kxk302 closed 2 years ago
Really nice! Looking forward testing it! THANK YOU!
Really nice! Looking forward testing it! THANK YOU!
Thanks @yvanlebras!
"When we get to the top right corner of the image, we simply move the filter down one pixel and restart from the right" -> the gif seems to show you move the filter down one pixel and restart from left isn't it? Moreover, it is mention then that the process will stop when reaching bottom right pixel... So this seems to go in the direction each "line" of pixels is considered from left to right isn't it?
I try to follow the tutorial and have an error with the first tool keras model config
Here is my history https://usegalaxy.eu/u/ylebras/h/test-tuto-cnn-deep-learning-machine-learning-images-ia
Here the std out
ML@0.8.3/lib/python3.6/site-packages/keras/layers/core.py", line 387, in _fix_unknown_dimension
raise ValueError(msg)
ValueError: total size of new array must be unchanged
Maybe "just" I made a mistake filling form but in case not....
"When we get to the top right corner of the image, we simply move the filter down one pixel and restart from the right" -> the gif seems to show you move the filter down one pixel and restart from left isn't it? Moreover, it is mention then that the process will stop when reaching bottom right pixel... So this seems to go in the direction each "line" of pixels is considered from left to right isn't it?
You are correct. That's a typo.
I try to follow the tutorial and have an error with the first tool keras model config
Here is my history https://usegalaxy.eu/u/ylebras/h/test-tuto-cnn-deep-learning-machine-learning-images-ia
Here the std out
ML@0.8.3/lib/python3.6/site-packages/keras/layers/core.py", line 387, in _fix_unknown_dimension raise ValueError(msg) ValueError: total size of new array must be unchanged
Maybe "just" I made a mistake filling form but in case not....
Looked at your history. I think you have set the input shape to 300,000 instead of 30,000 in your Keras model config.
Fixed all the typos. Thanks @anuprulez!
Hi Kaivan, thank you for the new version ! I have some issue on the last step, in case you have idea about the reason : https://usegalaxy.eu/u/ylebras/h/test-tuto-cnn-deep-learning-machine-learning-images-ia
Hi @kxk302
I tried to run this tutorial on usegalaxy.eu but getting this error
If using Keras pass *_constraint arguments to layers.
Traceback (most recent call last):
File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/keras_train_and_eval/785dd890e27d/keras_train_and_eval/keras_train_and_eval.py", line 551, in <module>
fasta_path=args.fasta_path,
File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/keras_train_and_eval/785dd890e27d/keras_train_and_eval/keras_train_and_eval.py", line 421, in main
) = train_test_split_none(X, y, groups, **test_split_options)
File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/keras_train_and_eval/785dd890e27d/keras_train_and_eval/keras_train_and_eval.py", line 103, in train_test_split_none
rval = train_test_split(*new_arrays, **kwargs)
File "/usr/local/tools/_conda/envs/__Galaxy-ML@0.8.3/lib/python3.6/site-packages/galaxy_ml/model_validations/_train_test_split.py", line 65, in train_test_split
arrays = indexable(*arrays)
File "/usr/local/tools/_conda/envs/__Galaxy-ML@0.8.3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 230, in indexable
check_consistent_length(*result)
File "/usr/local/tools/_conda/envs/__Galaxy-ML@0.8.3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 205, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [5015, 5016]
Also, for prediction on the test data, shouldn't we convert the test data to OHE? thanks!
https://usegalaxy.eu/u/ylebras/h/test-tuto-cnn-deep-learning-machine-learning-images-ia
Hi Anup, The test labels need not be converted to OHE. We only do that for training labels for categorical cross-entropy calculation. Can you please share your history? I can take a look. Thx
https://usegalaxy.eu/u/ylebras/h/test-tuto-cnn-deep-learning-machine-learning-images-ia
Hi @yvanlebras I created a workflow from your history and it has an extra input, called 'input dataset'. What should I set that to, to run your workflow? Thx
Hi @kxk302 https://usegalaxy.eu/u/kumara/h/fruitdatasets. I have made this history accessible but not sure if you will see the actual datasets. If not, can you give me your Galaxy's email id/username to share this history
Hi Kaivan,
I updated my history so there is no "extra" dataset on it.
Workflow is accessible there: https://usegalaxy.eu/u/ylebras/w/workflow-constructed-from-history-test-tuto-cnn-deep-learning-machine-learning-images-ia-1 History there: https://usegalaxy.eu/u/ylebras/h/test-tuto-cnn-deep-learning-machine-learning-images-ia
Just FYI, my stderr on the last step "Machine learning Visualization Extension":
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-19
OMP: Info #156: KMP_AFFINITY: 20 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 20 packages x 1 cores/pkg x 1 threads/core (20 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 3
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 4
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 5
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 6
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 7
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 8
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 9
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 10
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 11
OMP: Info #171: KMP_AFFINITY: OS proc 12 maps to package 12
OMP: Info #171: KMP_AFFINITY: OS proc 13 maps to package 13
OMP: Info #171: KMP_AFFINITY: OS proc 14 maps to package 14
OMP: Info #171: KMP_AFFINITY: OS proc 15 maps to package 15
OMP: Info #171: KMP_AFFINITY: OS proc 16 maps to package 16
OMP: Info #171: KMP_AFFINITY: OS proc 17 maps to package 17
OMP: Info #171: KMP_AFFINITY: OS proc 18 maps to package 18
OMP: Info #171: KMP_AFFINITY: OS proc 19 maps to package 19
OMP: Info #250: KMP_AFFINITY: pid 1954210 tid 1954210 thread 0 bound to OS proc set 0
Using TensorFlow backend.
/usr/local/tools/_conda/envs/__Galaxy-ML@0.8.3/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning:
sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
Traceback (most recent call last):
File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/ml_visualization_ex/14bd6d59650d/ml_visualization_ex/ml_visualization_ex.py", line 644, in <module>
title=args.title,
File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/ml_visualization_ex/14bd6d59650d/ml_visualization_ex/ml_visualization_ex.py", line 576, in main
true_labels, plot_selection, "header_true", "column_selector_options_true"
File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/ml_visualization_ex/14bd6d59650d/ml_visualization_ex/ml_visualization_ex.py", line 275, in get_dataframe
parse_dates=True,
File "/usr/local/tools/_conda/envs/__Galaxy-ML@0.8.3/lib/python3.6/site-packages/galaxy_ml/utils.py", line 190, in read_columns
data = data[cols]
File "/usr/local/tools/_conda/envs/__Galaxy-ML@0.8.3/lib/python3.6/site-packages/pandas/core/frame.py", line 3001, in __getitem__
indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)
File "/usr/local/tools/_conda/envs/__Galaxy-ML@0.8.3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1285, in _convert_to_indexer
return self._get_listlike_indexer(obj, axis, **kwargs)[1]
File "/usr/local/tools/_conda/envs/__Galaxy-ML@0.8.3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1092, in _get_listlike_indexer
keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
File "/usr/local/tools/_conda/envs/__Galaxy-ML@0.8.3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1177, in _validate_read_indexer
key=key, axis=self.obj._get_axis_name(axis)
KeyError: "None of [Index(['Label'], dtype='object')] are in the [columns]"
Maybe this error i due to the fact that I don't use an OHE version of "test_y_10.tsv" labels ? This is not mentionned on the tutorial if I am not wrong, and need to be added if relevant.
Another question, as I am not sure how it works ;) if I want to reuse such a "workflow" but on images I have on a shared space somewhere, for example showing pictures of algae, can you point me the manner to come from "real life" pictures files to such workflow entries ? It seems to me train_X_10.tsv file as test_X_10.tsv file are a representation of ""pixels values"" of a single file (or maybe I am totally out of scope ;) ) and train_Y_10.tsv file as test_Y_10.tsv files are allowing linking a "label" (for my example a name of algae for example) to several pictures (but here, for example in test_y_10.tsv, file_name don't allow the algos to reach the pictures and "anlayze" it as this is pointing to "folders" not reachable by Galaxy isn't it (like Test/Apple_Red_Delicious/229_100.jpg
for example)) ? Sorry if completely out of scope questions, I try to understand better ;)
Hi @anuprulez I imported your history but I get "You do not have permission to view this dataset" and all the jobs are in gray with a lock next to them.
Can you reload this history link? It should work now
Can you reload this history link? It should work now
Thanks @anuprulez . I think I know what the issue is. In the 'Deep learning training and evaluation' step, you must use the OHE labels, not train_y_10. That should fix the problem you are seeing.
So it appears to me we need to specify the fact that also for test_y_10 file labels, we need OHE creation. I propose a commit linked to it (duno why I only ca ncommit, not PR... sorry). Moreover, we can add a "tip" section after/before this OHE step so we can propose the user to know how many classes there is thanks to the use of "count occurences of each record" Galaxy tool on the first column of these test_y_10 and train_y_10 files
Can you reload this history link? It should work now https://usegalaxy.eu/u/kumara/h/fruitdatasets
Thanks @anuprulez . I think I know what the issue is. In the 'Deep learning training and evaluation' step, you must use the OHE labels, not train_y_10. That should fix the problem you are seeing.
@anuprulez I verified that your workflow now runs fine with the change I suggested.
So it appears to me we need to specify the fact that also for test_y_10 file labels, we need OHE creation. I propose a commit linked to it (duno why I only ca ncommit, not PR... sorry). Moreover, we can add a "tip" section after/before this OHE step so we can propose the user to know how many classes there is thanks to the use of "count occurences of each record" Galaxy tool on the first column of these test_y_10 and train_y_10 files
@yvanlebras I don't think that is necessary. I just fixed Anup's issue. Let me look at your workflow and I'll get back to you shortly.
I apply an OHE creation to the test_y_10.tsv file + choose "all columns" in the "machine leaning visualization Extension" and now have a result, but maybe not the good one ?
I apply an OHE creation to the test_y_10.tsv file + choose "all columns" in the "machine leaning visualization Extension" and now have a result, but maybe not the good one ?
@yvanlebras Noticed that in 'classification confusion matrix plot' step, you have 'Does the dataset contain header' set to 'No' for test_y_10. I changed that to 'Yes' and re-ran it and it completed fine. Also, it seems you have 2 classification confusion matrix plot steps. I guess one is redundant.
@anuprulez @yvanlebras It seems all the issues are resolved now. Could one of you please approve/merge this PR if there are no outstanding items left? Thanks!
Looks good to me. If @yvanlebras is also fine with, I would be happy to merge the PR.
@kxk302 can you rebase it against main? Thanks!
@kxk302 can you rebase it against main? Thanks!
Done @anuprulez !
Hey @kxk302 this doesn't look properly rebased, it looks like you merged in all of the updates on the main branch somehow, but not with a merge commit that github could understand. Could you tell me what you did precisely so we can advise against it in the other cases when it happens? This diff has me a bit worried with how unclear the changes are
Yes, now it indicates 168 files as changed, which should not be the case
Huh! Let me double check what I did (I think its advisable to have your coffee first before doing any Git operation!)
Ok, I got rid of the problematic commit and force-pushed the change to my branch. I tried to rebase main but it says everything is up to date. So, I guess when the tests pass, this PR should be good to merge?
Thank you all!
This tutorial discusses using CNN for image classification on fruit 360 dataset..