galaxyproject / tools-iuc

Tool Shed repositories maintained by the Intergalactic Utilities Commission
https://galaxyproject.org/iuc
MIT License
162 stars 436 forks source link

Possible problem with data_manager_dada2 #2783

Closed gregvonkuster closed 4 years ago

gregvonkuster commented 4 years ago

I installed the data2_data_manager locally and tried installing a reference dataset and it looks like the tool is not functioning correctly.

Here is the command line:

python '/home/galaxies/galaxy/database/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/bf7b2c14cabc/data_manager_dada2/data_manager/data_manager.py' --out '/home/galaxies/galaxy/jwd/000/125/galaxy_dataset_155.dat' --dataset 'unite_8.0_fungi'

Here is the output.

$ ll dataset_155_files/
total 29060
drwxrwxrwx 3 greg greg     4096 Jan 17 11:54 .
drwxrwxrwx 3 greg greg     4096 Jan 17 12:08 ..
drwxrwxrwx 2 greg greg     4096 Jan 17 11:54 developer
-rw-rw-rw- 1 greg greg 25567696 Jan 17 11:54 sh_general_release_dynamic_02.02.2019.fasta
-rw-rw-rw- 1 greg greg  4171566 Jan 17 11:55 unite_8.0_fungi.taxonomy

$ ll dataset_155_files/developer/
total 30964
drwxrwxrwx 2 greg greg     4096 Jan 17 11:54 .
drwxrwxrwx 3 greg greg     4096 Jan 17 11:54 ..
-rw-rw-rw- 1 greg greg 31697415 Jan 17 11:54 sh_general_release_dynamic_02.02.2019_dev.fasta

But from what I've seen, nothing is moved into the Galaxy ~/tool-data directory.

This seems to have occurred on Galaxy test as well per https://github.com/galaxyproject/usegalaxy-playbook/issues/273

gregvonkuster commented 4 years ago

I found some time to look into this a bit (running Galaxy version 19.09) and I've found the cause, although I've not yet figured out the proper fix.

It seems that the code here https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/tools/data_manager/manager.py#L280 requires the data_table_data dictionary keys to be add or remove, but this dictionary for data_manager_dada2 looks like this.

data_table_data: {'name': 'UNITE: General Fasta release 8.0 for Fungi', 'path': 'unite_8.0_fungi.taxonomy', 'taxlevels': 'Kingdom,Phylum,Class,Order,Family,Genus,Species', 'value': 'unite_8.0_fungi'}.

I believe there was some recent work on all data manager tools to properly handle multiple entries for the same data manager, but I didn't follow it closely enough to know the proper fix for this. So I'm hoping someone that was involved in that work can jump in here.

The following hack does properly install the reference for use in Galaxy:

$ git diff manager.py
diff --git a/lib/galaxy/tools/data_manager/manager.py b/lib/galaxy/tools/data_manager/manager.py
index 6d72d68053..87b811f0e0 100644
--- a/lib/galaxy/tools/data_manager/manager.py
+++ b/lib/galaxy/tools/data_manager/manager.py
@@ -272,6 +272,7 @@ class DataManager(object):
             if isinstance(data_tables_dict.get(data_table_name), dict):

                 data_table_data = data_tables_dict.get(data_table_name, None)
+                data_table_data = {'add' : data_table_data}
                 # Validate results
                 if not data_table_data:
                     log.warning('Data table seems invalid: "%s".' % data_table_name)

Hopefully this is just a trivial change to the data_manager_dada2 config file or something. :)

mvdbeek commented 4 years ago

Yeah, that looks broken, I'll have a look at that. It was introduced with https://github.com/galaxyproject/galaxy/pull/8250

gregvonkuster commented 4 years ago

Ah, thanks very much @mvdbeek ;)

gregvonkuster commented 4 years ago

Thanks for the fix @mvdbeek ;)