genouest / biomaj2galaxy

BioMAJ post processes to manipulate Galaxy tool data tables
MIT License
1 stars 5 forks source link

NCBI-BLAST and BWA References not showing up in galaxy tools #1

Closed nagoue closed 5 years ago

nagoue commented 5 years ago

Hi, I am using latest BioMAJ and BioMAJ2 Galaxy versions and Galaxy v18.05. For example, Anopheles_gambiae.properties looks like:

bwa.name=bwa
bwa.desc=Build bwa index
bwa.type=index
bwa.exe=bwa.sh
bwa.args="fasta/all.fasta" "bwa/all"
bwa.cluster=false
makeblastdb.name=makeblastdb
makeblastdb.desc=Index blast
makeblastdb.type=index
makeblastdb.args="fasta/all.fasta" "blast/" "-dbtype nucl -parse_seqids" ${db.name}
makeblastdb.cluster=false
makeblastdb.exe=makeblastdb.sh
makeblastdb_p.name=makeblastdb_p
makeblastdb_p.desc=Index blast protein
makeblastdb_p.type=index
makeblastdb_p.args="fasta/protein.faa" "blast/protein/" "-dbtype prot -parse_seqids" ${db.name}_protein
makeblastdb_p.cluster=false
makeblastdb_p.exe=makeblastdb.sh
GALAXY.db.post.process=GAL
GAL=galaxy_dm
galaxy_dm.desc=Add files to Galaxy tool data tables
galaxy_dm.type=galaxy
galaxy_dm.exe=biomaj2galaxy
galaxy_dm.args=-v -f /xxx/.bm2g.yml add -d "${localrelease}" -n "Anopheles gambiae NCBI (${remoterelease})" --no-file-check -g fasta/all.fasta bowtie2:bowtie2/all twobit:2bit/all.2bit bwa:bwa/all star:star/all hisat2:hisat2/all "blastdb:blast/Anopheles_gambiae:Anopheles gambiae" "blastdb:blast/Anopheles_gambiae_protein:Anopheles gambiae protein"

bwa.loc file:

Anopheles_gambiae_AgamP3        Anopheles_gambiae_AgamP3        Anopheles gambiae NCBI (AgamP3) /xxx/ncbi/genomes/Anopheles_gambiae/Anopheles_gambiae_AgamP3/bwa/all

blastdb.loc file:

Anopheles_gambiae_AgamP3_fe811f3f-a610-4edb-82ab-ae6986a90aba   Anopheles gambiae       /xxx/ncbi/genomes/Anopheles_gambiae/Anopheles_gambiae_AgamP3/blast/Anopheles_gambiae
Anopheles_gambiae_AgamP3_da956145-59ac-4ad9-a7a7-b2687b32f735   Anopheles gambiae protein       /xxx/ncbi/genomes/Anopheles_gambiae/Anopheles_gambiae_AgamP3/blast/Anopheles_gambiae_protein

BioMAJ went through the update but bwa and blast indexes are not accessible through galaxy interface (bowtie2, star and Hisat2 are fine).

Would you have hints where to dig ... ? Thanks,

abretaud commented 5 years ago

Hi, Have you tried to restart Galaxy? Sometimes, it seems to be needed when you're adding a first entry into an empty data table

nagoue commented 5 years ago

Hi, Yes I also tried a Galaxy restart at each of the following steps:

I also over checked that fields in loc files are tab separated. Thanks,

abretaud commented 5 years ago

Ok... strange! In the admin section of the galaxy instance, you have a "Data tables" section, can you tell me what are the filenames for each of the problematic tables?

nagoue commented 5 years ago

I have (too) many lines in that section:

- blastdb    /xxx/tools/tool-data/blastdb.loc
- blastdb    /xxx/tools/tool-data/toolshed/repos/iuc/data_manager_manual/6524e573d9c2/blastdb.loc

Those 2 blastdb.loc files contain the same info.

- blastdb_p    /xxx/tools/tool-data/blastdb_p.loc
- blastdb_p    /xxx/tools/tool-data/toolshed/repos/devteam/ncbi_blast_plus/e25d3acf6e68/blastdb_p.loc

Those 2 blastdb_p.loc files are empty. I re-run biomaj as I had a mistake in biomaj2galaxy command (correct path is "blastdb:blast/protein/Anopheles_gambiae_protein:Anopheles gambiae protein")

- bwa_indexes    /xxx/tools/tool-data/bwa_index.loc
- bwa_indexes    /xxx/tools/tool-data/toolshed/repos/iuc/data_manager_manual/6524e573d9c2/bwa_index.loc

Those 2 bwa_index.loc files contain the same info.

- bwa_mem_indexes    /xxx/tools/tool-data/bwa_mem_index.loc
- bwa_mem_indexes    /xxx/tools/tool-data/toolshed/repos/devteam/bwa/8d2a528a9513/bwa_mem_index.loc

Those 2 bwa_mem_index.loc files are empty.

I did cleaning in shed_tool_data_table_conf.xml file which had multiple table definitions with similar name, keeping the ones coming from data_manager_manual and restarted Galaxy. No change

nagoue commented 5 years ago

I did a metadata reset, not sure it is connected but ... I noticed Nucleotide BLAST database gives access to all indexes (i.e. Anopheles gambiae, Anopheles gambiae protein) but no index is accessible from Protein BLAST database. Do I miss something in my Galaxy post-process to well-separate nucleotide and protein index?

abretaud commented 5 years ago

Ah yes! I had not seen it, but in your bank properties file, it should be this way: "blastdb_p:blast/Anopheles_gambiae_protein:Anopheles gambiae protein" instead of: "blastdb:blast/Anopheles_gambiae_protein:Anopheles gambiae protein"

nagoue commented 5 years ago

Cool! It's working. Thanks a lot, I understand better :) It's creating a blastdb_p.loc file Could it be a similar problem with bwa ? I have installed the latest tool version I guess (https://toolshed.g2.bx.psu.edu/view/devteam/bwa/8d2a528a9513) which installed 2 tools "Map with BWA-MEM" and "Map with BWA". Should I get at some point a bwa_mem_index.loc ? but then "Map with BWA" tool should get access to indexes, right ?

abretaud commented 5 years ago

The logic for data tables is that :

When you add items to the tables using biomaj2galaxy, the first part of each expression is the data table name (e.g. blastdb_p in blastdb_p:blast/Anopheles_gambiae_protein:Anopheles gambiae protein).

As some data tables name are not very intuitive, biomaj2galaxy automatically translates a few good-looking names to the real names (like bwa_mem which is understood as bwa_mem_indexes, see the code at https://github.com/genouest/biomaj2galaxy/blob/2001dc1284add9bc9b2193a70bc62a18dd1415bc/biomaj2galaxy/commands/add.py#L88)

Tell me if it's still not clear!

nagoue commented 5 years ago

Thanks for the complementary information and your time.

For bwa indexes, I had to clean the shed_tool_data_table_conf.xml file to finally see the reference genome appearing in the bwa tools scroll-down menu.

Cheers,

abretaud commented 5 years ago

Ok, great! Closing this, dno't hesitate to open another issue if you have other problems (I hope not!) Cheers