bgruening / galaxytools

:microscope::books: Galaxy Tool wrappers
MIT License
116 stars 227 forks source link

SortMeRNA: update version 4.3.6 #1316

Closed gallardoalba closed 11 months ago

gallardoalba commented 1 year ago

Main changes:

bernt-matthias commented 1 year ago

Cool. One of my users just asked for an update. Can I help here?

gallardoalba commented 1 year ago

Cool. One of my users just asked for an update. Can I help here?

Tomorrow I'll continue working on it; I'll ping you if I find any problem.

gallardoalba commented 1 year ago

This is the problem that I found and that temporarily paralyzed the PR @bernt-matthias ; apparently Sortmerna generates an alignment in the temporary folder, and galaxy tries to index it without success, generating this error:

Screenshot from 2023-08-23 15-36-13

I tried to specify the path of this folder in order to provide an adequate extension, but I think it is not possible.

bernt-matthias commented 1 year ago

Can you check if the file is empty?

gallardoalba commented 1 year ago

Can you check if the file is empty?

You are right, this is indeed the problem. I'll try to find a better input file.

bgruening commented 1 year ago

. ERROR: Test 1: Found output tag with unknown name [output_fastx], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo'] .. ERROR: Test 2: Found output tag with unknown name [output_fastx], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo'] .. ERROR: Test 5: Found output tag with unknown name [aligned_paired], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo'] .. ERROR: Test 5: Found output tag with unknown name [unaligned_paired], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo'] .. CHECK: 9 test(s) found. Applying linter output... CHECK .. INFO: 14 outputs found. Applying linter inputs... WARNING .. WARNING: Param input [num_alignments] 'name' attribute is redundant if argument implies the same name.

For the linting. Sorry for such a messy tool :(

gallardoalba commented 1 year ago

. ERROR: Test 1: Found output tag with unknown name [output_fastx], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo'] .. ERROR: Test 2: Found output tag with unknown name [output_fastx], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo'] .. ERROR: Test 5: Found output tag with unknown name [aligned_paired], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo'] .. ERROR: Test 5: Found output tag with unknown name [unaligned_paired], valid names ['aligned', 'aligned_forward', 'aligned_reverse', 'aligned_forward_singleton', 'aligned_reverse_singleton', 'unaligned', 'unaligned_forward', 'unaligned_reverse', 'unaligned_forward_singleton', 'unaligned_reverse_singleton', 'output_bam', 'output_blast', 'output_biom', 'output_de_novo'] .. CHECK: 9 test(s) found. Applying linter output... CHECK .. INFO: 14 outputs found. Applying linter inputs... WARNING .. WARNING: Param input [num_alignments] 'name' attribute is redundant if argument implies the same name.

For the linting. Sorry for such a messy tool :(

Now should be fine; some scripts were removed (e.g. merge-paired-reads.sh and unmerge-paired-reads.sh), and replaced by equivalent functionalities.

bgruening commented 1 year ago

@gallardoalba a profile version enables an own HOME dir for every job.

gallardoalba commented 1 year ago

@gallardoalba a profile version enables an own HOME dir for every job.

Perfect, thanks for including it.

gallardoalba commented 1 year ago

Do you think it could be merged @bernt-matthias? I would like to test if it works with the installed indexed genomes.

bernt-matthias commented 1 year ago

I would like to test if it works with the installed indexed genomes.

Would be cool to have a test. Let me know if the PR is ready from your side and I will review and merge.

gallardoalba commented 1 year ago

I would like to test if it works with the installed indexed genomes.

Would be cool to have a test. Let me know if the PR is ready from your side and I will review and merge.

Hi @bernt-matthias, I'm trying to create the test for the database, but I'm not sure how to create the file structure. According this https://github.com/bgruening/galaxytools/blob/master/data_managers/data_manager_sortmerna_database_downloader/data_manager/data_manager_sortmerna_download.py#L122 it seems to be fine, but don't know why the tool is not able to recognize it. Would you mind to have a look? Thanks a lot!

bernt-matthias commented 11 months ago

Hi @gallardoalba what is the state here?

Would you mind to have a look?

What exactly should I look at? Is there a failing test that I could examine?

bernt-matthias commented 11 months ago

Will add a test for cached data. Wondering if the loops are correct, i.e. in https://github.com/bgruening/galaxytools/blob/dfa44145539a0897f369311667dbe4ca27ff7dc4/tools/rna_tools/sortmerna/sortmerna.xml#L70 we loop over a list derived from a comma separated string. But actually we have a select with multiple="true".

bernt-matthias commented 11 months ago

I get the impression that the use of (multiple?) cached references was already wrong in 2.1. But I guess most of the time a single one is used. The docs state

      --ref             STRING,STRING   FASTA reference file, index file                               mandatory
                                         (ex. --ref /path/to/file1.fasta,/path/to/index1)
                                         If passing multiple reference files, separate 
                                         them using the delimiter ':',
                                         (ex. --ref /path/to/file1.fasta,/path/to/index1:/path/to/file2.fasta,path/to/index2)

But Galaxy just executes with --ref REF1,REF2,REF3,....

Also the indexdb (actually indexdb_rna) executed with the datamanager is not used anymore. I guess we can / should ignore the indexes created by the data manager.

bernt-matthias commented 11 months ago

Hi @bgruening .. I was still fixing bugs and adding tests wrt refereces. I stopped CI, but feel free to restart if you need the current state.

I will open a followup PR.