DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
732 stars 274 forks source link

Options for output? #111

Open SK-N-BE opened 5 years ago

SK-N-BE commented 5 years ago

When I am using kraken2 (with standard database) my output looks like this:

5.65    10  10  U   0   unclassified
 94.35  167 0   R   1   root
 94.35  167 0   R1  131567    cellular organisms
 92.66  164 0   D   2       Bacteria
 92.09  163 0   P   1224          Proteobacteria
 92.09  163 1   C   1236            Gammaproteobacteria
 91.53  162 0   O   72274             Pseudomonadales
 90.96  161 0   F   468             Moraxellaceae
 90.96  161 1   G   469               Acinetobacter
 89.83  159 4   G1  909768                  Acinetobacter calcoaceticus/baumannii complex
 87.01  154 154 S   470                   Acinetobacter baumannii
  0.56  1   1   S   106654                    Acinetobacter nosocomialis
  0.56  1   1   S   28090                   Acinetobacter lwoffii
  0.56  1   0   F   135621              Pseudomonadaceae
  0.56  1   0   G   286               Pseudomonas
  0.56  1   0   G1  136841                  Pseudomonas aeruginosa group
  0.56  1   1   S   287                   Pseudomonas aeruginosa
  0.56  1   0   D1  1783272       Terrabacteria group
  0.56  1   0   P   544448          Tenericutes
  0.56  1   0   C   31969             Mollicutes
  0.56  1   0   O   2085                Mycoplasmatales
  0.56  1   0   F   2092                  Mycoplasmataceae
  0.56  1   0   G   2093                    Mycoplasma
  0.56  1   0   G1  656088                    Mycoplasma mycoides group
  0.56  1   0   S   2102                        Mycoplasma mycoides
  0.56  1   1   S1  40477                         Mycoplasma mycoides subsp. capri
  1.13  2   0   D   2759        Eukaryota
  1.13  2   0   D1  33154         Opisthokonta
  1.13  2   0   K   33208           Metazoa
  1.13  2   0   K1  6072              Eumetazoa
  1.13  2   0   K2  33213               Bilateria
  1.13  2   0   K3  33511                 Deuterostomia
  1.13  2   0   P   7711                    Chordata
  1.13  2   0   P1  89593                     Craniata
  1.13  2   0   P2  7742                        Vertebrata
  1.13  2   0   P3  7776                          Gnathostomata
  1.13  2   0   P4  117570                          Teleostomi
  1.13  2   0   P5  117571                            Euteleostomi
  1.13  2   0   P6  8287                                Sarcopterygii
  1.13  2   0   P7  1338369                               Dipnotetrapodomorpha
  1.13  2   0   P8  32523                                   Tetrapoda
  1.13  2   0   P9  32524                                     Amniota
  1.13  2   0   C   40674                                       Mammalia
  1.13  2   0   C1  32525                                         Theria
  1.13  2   0   C2  9347                                            Eutheria
  1.13  2   0   C3  1437010                                           Boreoeutheria
  1.13  2   0   C4  314146                                              Euarchontoglires
  1.13  2   0   O   9443                                                  Primates
  1.13  2   0   O1  376913                                                  Haplorrhini
  1.13  2   0   O2  314293                                                    Simiiformes
  1.13  2   0   O3  9526                                                        Catarrhini
  1.13  2   0   O4  314295                                                        Hominoidea
  1.13  2   0   F   9604                                                            Hominidae
  1.13  2   0   F1  207598                                                            Homininae
  1.13  2   0   G   9605                                                                Homo
  1.13  2   2   S   9606                                                                  Homo sapiens
  0.56  1   0   D   2157        Archaea
  0.56  1   0   P   28890         Euryarchaeota
  0.56  1   0   P1  2290931         Stenosarchaea group
  0.56  1   0   C   183963            Halobacteria
  0.56  1   0   O   1644055             Haloferacales
  0.56  1   0   F   1644056               Haloferacaceae
  0.56  1   0   G   2251                    Haloferax
  0.56  1   0   S   2246                      Haloferax volcanii
  0.56  1   1   S1  309800                      Haloferax volcanii DS2

As I am using kraken just to confirm the species, I am not interested in all those species "below" Acinetobacter baumannii. Is there an option allowing to determine that only those results are shown for which the percentage is e.g. higher than 50% so that the output would look like the following

5.65    10  10  U   0   unclassified
 94.35  167 0   R   1   root
 94.35  167 0   R1  131567    cellular organisms
 92.66  164 0   D   2       Bacteria
 92.09  163 0   P   1224          Proteobacteria
 92.09  163 1   C   1236            Gammaproteobacteria
 91.53  162 0   O   72274             Pseudomonadales
 90.96  161 0   F   468             Moraxellaceae
 90.96  161 1   G   469               Acinetobacter
 89.83  159 4   G1  909768                  Acinetobacter calcoaceticus/baumannii complex
 87.01  154 154 S   470                   Acinetobacter baumannii

?

wolfgangrumpf commented 5 years ago

You could just use AWK to filter out anything that doesn't match the taxonomic level you're interested in as well as filtering by numeric on the appropriate column...

SK-N-BE commented 5 years ago

Thank you for your reply wolfgangrumpf! May you please give me an example how to use the AWK filter?