DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
235 stars 73 forks source link

The report generation step takes too much time #228

Closed ryuzheng closed 2 years ago

ryuzheng commented 2 years ago

Hi all, I use the hpvc(https://zenodo.org/record/3732127/files/h+p+v+c.tar.gz?download=1) index with about 11GB 95 million reads fastq. The classification output file generated normally,but the tabular report file is empty and I’ve been waiting 12 hours.

The centrifuge version is Centrifuge v1.0.4.

The command I used centrifuge -x ~/Reference/centrifuge_p+h+v_202003/hpvc -U unmap.filter.fq --host-taxids 9606 --exclude-taxids 9000000,9000001,9000002,9000003,9000004,9000005,9000006,9000007,9000008,9000009,9000010,9000011,9000012,9000013,9000014,9000015,9000016,9000017,9000018,9000019,9000020,9000021,9000022,9000023,9000024,9000025,9000026,9000027,9000028,9000029,9000030,9000031,9000032,9000033,9000034,9000035,9000036,9000037,9000038,9000039,9000040,9000041,9000042,9000043,9000044,9000045,9000046,9000047,9000048,9000049,9000050,9000051,9000052,9000053,9000054,9000055,9000056,9000057,9000058,9000059,9000060,9000061,9000062,9000063,9000064,9000065,9000066,9000067,9000068,9000069,9000070,9000071,9000072,9000073,9000074,9000075,9000076,9000077,9000078,9000079,9000080,9000081,9000082,9000083,9000084,9000085,9000086,9000087,9000088,9000089,9000090,9000091,9000092,9000093,9000094,9000095,9000096,9000097,9000098,9000099,9000100,9000101,9000102,9000103,9000104 --min-hitlen 16 -S unmap_p+h+v.txt --report-file unmap_p+h+v.tsv -p 20 2>&1 > p+h+v.log

I found that classification output file is incomplete. And the program becomes single thread When it comes to report generation step. Can I just skip report generation step? or is there a way to generate complete classification output file that I can summary on my own.

A01050:665:HMJNFDSX2:2:1304:2826:24455  NC_000009.12    9606    9       0       18      44      1
A00917:738:HMLKMDSX2:4:2269:21549:3114  unclassified    0       0       0       0       16      1
A01050:665:HMJNFDSX2:2:2105:23014:14325 NC_000011.10    9606    169     4       28      59      1
A01050:665:HMJNFDSX2:2:2423:1244:6464   NC_02%
mourisl commented 2 years ago

It could be the abundance estimation step took too much time. Could you please run Centrifuge with option --no-abundance?

ryuzheng commented 2 years ago

It could be the abundance estimation step took too much time. Could you please run Centrifuge with option --no-abundance?

I tried with --no-abundance, it worked. Thank you.