bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
992 stars 182 forks source link

colnames and EC number #776

Open SueFletcher opened 5 months ago

SueFletcher commented 5 months ago

hello, I want to thank you for this amazing tool. I tried to use it and it went really fast I used these commands as you suggested :

downloading the tool

wget http://github.com/bbuchfink/diamond/releases/download/v2.1.8/diamond-linux64.tar.gz tar xzf diamond-linux64.tar.gz

creating a diamond-formatted database file

./diamond makedb --in reference.fasta -d reference

running a search in blastp mode

./diamond blastp -d reference -q queries.fasta -o matches.tsv

running a search in blastx mode

./diamond blastx -d reference -q reads.fasta -o matches.tsv

Now I'm wondering what are the column names for my output data file PNEG_00003T0 sp|Q09895|YAI8_SCHPO 45.7 387 195 7 1 373 1 386 7.08e-114 340 PNEG_00003T0 sp|Q9Y282|ERGI3_HUMAN 38.2 387 215 8 8 383 8 381 4.22e-83 261

second question : how to parameter the tool in term of e-value, qcov_hsp_perc etc final question how I could determine EC number from this output file !! thank you in advance !!

bbuchfink commented 5 months ago

The columns are explained here: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial

how to parameter the tool in term of e-value, qcov_hsp_perc etc

All options are explained in the Wiki.

final question how I could determine EC number from this output file !!

These mappings can be downloaded at sites e.g. Uniprot. I'm to aware of a tool to do this though.

SueFletcher commented 5 months ago

@bbuchfink Thank you I didn't notice that. for clustering , there is diamond cluster for protein , is there a solution for clustering applied on nucleotide ?

bbuchfink commented 5 months ago

There are other tools that can do that, but diamond works only on proteins.

SueFletcher commented 5 months ago

@bbuchfink thank you otherwise I still can apply blastx using swissprot by applyinf this command: ./diamond blastx -d swissprot -q queries.fasta -o matches.tsv ? or only ./diamond blastp -d swissprot -q queries.fasta -o matches.tsv works

SueFletcher commented 5 months ago

@bbuchfink sorry again by in this command : diamond cluster -d INPUT_FILE -o OUTPUT_FILE --approx-id 30 -M 64G how I can pass the path of my protein fasta input file -d is for the database and -o in for output :/

AMbioinformatics commented 5 months ago

Supported formats are FASTA and DIAMOND (.dmnd), so you can provide after -d also your FASTA file.

bbuchfink commented 5 months ago

otherwise I still can apply blastx using swissprot by applyinf this command:

Yes of course you can use the blastx mode of diamond.