DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
246 stars 73 forks source link

centrifuge k-report not conforming to pavian #240

Open lewhiteside opened 2 years ago

lewhiteside commented 2 years ago

Hi there,

Im trying to visualise my results from centrifuge but running into an issue I dont know how to fix, any help would be greatly appreciated!

I ran all my sequences that I wanted to classify against the index:

centrifuge -x abv -U all.fastq

The following is an extract of my output file (cent_reads.output):

readID  seqID   taxID   score   2ndBestScore    hitLength   queryLength numMatches
97901713-ef67-4d40-a817-b3924c4e7fe5    NZ_CP077082.1   2745519 7632    0   280 1568    1
c74b175b-fa1a-4f6c-82ec-f45e905b84e5    NZ_CP087134.1   2893885 1156    0   49  1067    1
c52b3331-ac70-4c9e-9372-979715544b72    NZ_CP075896.1   2838015 64  64  23  2698    3
c52b3331-ac70-4cct 9e-9372-979715544b72 NZ_CP042218.1   1176533 64  64  23  2698    3
c52b3331-ac70-4c9e-9372-979715544b72    NZ_CP044328.1   173366  64  64  23  2698    3
bb9266ec-7a10-430a-8349-33b8e40add56    NZ_CP019426.1   1573712 8222    0   243 763 1

I then ran the following code to create the kreport:

centrifuge-kreport -x abv cent_reads.output

An extract of the output:

Loading taxonomy ...
Loading names file ...
/opt/apps/centrifuge/1.0.4/bin/centrifuge-inspect:24: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
Loading nodes file ...
/opt/apps/centrifuge/1.0.4/bin/centrifuge-inspect:24: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
Use of uninitialized value $a in numeric gt (>) at /opt/apps/centrifuge/1.0.4/bin/centrifuge-kreport line 164, <> line 403234.
Use of uninitialized value $taxID in hash element at /opt/apps/centrifuge/1.0.4/bin/centrifuge-kreport line 113, <> line 403234.
Use of uninitialized value $a in numeric gt (>) at /opt/apps/centrifuge/1.0.4/bin/centrifuge-kreport line 164, <> line 403235.
Use of uninitialized value $taxID in hash element at /opt/apps/centrifuge/1.0.4/bin/centrifuge-kreport line 113, <> line 403235.
Use of uninitialized value $a in numeric gt (>) at /opt/apps/centrifuge/1.0.4/bin/centrifuge-kreport line 164, <> line 403236.
Use of uninitialized value $taxID in hash element at /opt/apps/centrifuge/1.0.4/bin/centrifuge-kreport line 113, <> line 403236.
Use of uninitialized value $a in numeric gt (>) at /opt/apps/centrifuge/1.0.4/bin/centrifuge-kreport line 164, <> line 403237.
Use of uninitialized value $taxID in hash element at /opt/apps/centrifuge/1.0.4/bin/centrifuge-kreport line 113, <> line 403237.
  0.00  0   0   U   0   unclassified
100.00  276298  455 -   1   root
 98.41  271912  454 -   131567    cellular organisms
 97.98  270709  6809    D   2       Bacteria
 85.13  235224  5181    P   1224          Proteobacteria
 50.75  140236  1155    C   1236            Gammaproteobacteria
 30.00  82884   14  O   72274             Pseudomonadales
 29.95  82743   49  F   135621              Pseudomonadaceae

When I run this in pavian I get the following error message:


The following files did not conform the report format: 
- cent_reads_kreport.output

This is my first time using centrifuge (Im using version 1.0.4) - and I think ive gone wrong somewhere?

Thanks in advance for any help - and apologise if I havnt given all the relevant information - will be happy to provide more!

mourisl commented 2 years ago

Could you please show me the lines 403234-403237 in the cent_reads.output file?

wittler-github commented 1 year ago

Good day, as seen in these files centrifuge_report&kreport.zip

So I use centrifuge version 1.04, I get a compatible file cenrifuge_kreport working with pavian, however there is an error below when using centrifuge-kreport, does the final file miss information because of this, I cannot find the error message in centrifuge-kreport nor centrifuge-inspect

export CENTRIFUGE_HOME=pathway/ProgramsAndTools/Centrifuge/centrifuge-1.0.4/

I=pathway/index_M_I/M_I $fq pathway to a file to scan

$CENTRIFUGE_HOME/centrifuge -x $I -U $fq -S classification_results.log --report-file centrifuge_report.tsv $CENTRIFUGE_HOME/centrifuge-kreport -x $I centrifuge_report.tsv > centrifuge_k_report.output

Use of uninitialized value $headerMap{"readID"} in array element at pathway/ProgramsAndTools/Centrifuge /centrifuge-1.0.4/centrifuge-kreport line 92, <> line 2. Use of uninitialized value $headerMap{"seqID"} in array element at pathway/ProgramsAndTools/Centrifuge/ centrifuge-1.0.4/centrifuge-kreport line 93, <> line 2. Use of uninitialized value $headerMap{"score"} in array element at pathway/ProgramsAndTools/Centrifuge/ centrifuge-1.0.4/centrifuge-kreport line 95, <> line 2. Use of uninitialized value $headerMap{"hitLength"} in array element at /pathway/ProgramsAndTools/Centrif uge/centrifuge-1.0.4/centrifuge-kreport line 96, <> line 2. Use of uninitialized value $headerMap{"queryLength"} in array element at pathway/ProgramsAndTools/Centr ifuge/centrifuge-1.0.4/centrifuge-kreport line 97, <> line 2. Use of uninitialized value $headerMap{"numMatches"} in array element at pathway/ProgramsAndTools/Centri fuge/centrifuge-1.0.4/centrifuge-kreport line 98, <> line 2. ... for many more terms

LilyAnderssonLee commented 1 year ago

@lewhiteside @wittler-github did you succeed to run centrifuge-kreport? You can find my solution #62

wittler-github commented 1 year ago

@LilyAnderssonLee Thank you. Was a while I looked at this, I believe I at least got past this as an issue. Was using some statistical/visualisation biopython scripts tailored to visualise contents of .tsv and classification .log files and connect with NCBI taxonomy, may add them to my github eventually or suggest inclusion for centrifuge (one can let know if wanting help with such analysis).