Minamehr / EUKulele

0 stars 1 forks source link

Fix wrapper #3

Closed paulzierep closed 2 months ago

paulzierep commented 2 months ago

This works using the test DB from the DM, but:

Hope that helps to bring you further !

Minamehr commented 2 months ago

Many thanks, I checked the other databases (PhyloDB, EukProt, MMETSP, and the default which is marmmetsp"MMETSP+MarRef"). If the three main required data (reference.pep.fa, tax-table.txt,prot-map.json) are included inside the "--reference_dir" it will be okay and will not download again. We must also download the diamond reference inside each database for the first time. (default): EUKulele -s MAGs_input -o output -m mags --reference_dir test-data --alignment_choice diamond --p_ext .faa --consensus_cutoff 0.75 (MMETSP): EUKulele -s MAGs_input -o output1 -m mags --database MMETSP --reference_dir test-data --alignment_choice diamond --p_ext .faa --consensus_cutoff 0.75 (phylodb): EUKulele -s MAGs_input -o output2 -m mags --database phylodb --reference_dir /test-data/phylodb/ --alignment_choice diamond --p_ext .faa --consensus_cutoff 0.75 (EukProt): EUKulele -s MAGs_input -o output3 -m mags --database eukprot --reference_dir /test-data/eukprot/ --alignment_choice diamond --p_ext .faa --consensus_cutoff 0.75

paulzierep commented 2 months ago

Mhh did the DB download actually work for u ? Its breaks for me at the create-protein-table.py step ... I will try in a notebook and see if it is just my machine

Minamehr commented 2 months ago

I used the following commands "EUKulele download --database phylodb" and then "create_protein_table.py --infile_peptide reference.pep.fa --infile_taxonomy taxonomy-table.txt --outfile_json prot-map.json --output tax-table.txt --delim "\t" --col_source_id strain_name --taxonomy_col_id taxonomy --column 2" it will make all the three files "fasta, prot and tax-table" and then the tool is okay after that (will run without re-downloading again the database). However, I figured out that in my machine the final databases for "phylodb and eukprot" are not completed and make an empty file for the "prot-map.json", which is quite strange. The default database and mmetsp" are perfectly okay.

paulzierep commented 2 months ago

This part is also triggy:

Found database folder for . in current directory; will not re-download.
Creating a diamond reference from database files...

I wonder if there is a way to avoid having to create the diamond DB. I think it works if the diamond index is in the same folder as the DB ...

Creating a diamond reference from database files...
Diamond database file already created; will not re-create database.
Minamehr commented 2 months ago

It seems that the Diamond reference must be included in each Database directory. For example, I had the Diamond reference in the test-data directory (the same directory where all the databases are saved), but again it downloaded the diamond inside the "mmetsp" directory. EUKulele -s test-data/MAGs_input -o test-data/output4 -m mags --database MMETSP --reference_dir test-data --alignment_choice diamond --p_ext .faa --consensus_cutoff 0.75 Found database folder for test-data/mmetsp in current directory; will not re-download. Creating a diamond reference from database files...

However, when it was inside the "mmetsp" directory it will not download it again: EUKulele -s test-data/MAGs_input -o test-data/output4 -m mags --database MMETSP --reference_dir test-data --alignment_choice diamond --p_ext .faa --consensus_cutoff 0.75 Found database folder for test-data/mmetsp in the current directory; will not re-download. Diamond alignment file already detected; will not re-run step.